: f 
a 
=o 
Sk 
Pee 
cat 
( ) 
4 


TECHNOMETRICS 


A Journal of Statistics 
for the Physical, Chemical 
and Engincering Sciences 


Engin. Library 


Q 
216 


Al 


T25 
v.43 
no 3 


AUGUST 1961 


VOLUME 3 NUMBER 3 


Published quarterly by: 


Che American Society for Quality Control 
and Che American Statistical Association 





statistical methods to new or novel environments, expository or tutorial payers on partic- 
ular statistical methods, and papers dealing with the philosophy and problems of applying 
statistical methods to research, development, design and performance. Brief descriptions of 
problems requiring solution and short technical nctes will also be accepted for publication. 
Letters to the Editor, signed by the author and limited in length will be published w: en 
they are considered timely and appropriate. All papers should coniain a short but c!ear 
summary of contents and conct:’sions, an expository section containing numerical examples 
whenever practicable, and appropriate additional sections relating to technical derivations. 


Subscription Rates 


The annual subscription rate for members of the American Society ior Quality Control 
or the American Statistical Association is $6.00 a year. The annual subscription rate for 
non-members is $8.00 a year. 


Members of the sponsoring societies, the American Society for Quality Control and the 
American Statistical Association, may subscribe to the journal while paying their annual 
dues or by check or money order made out to Technometrics and mailed to: 


TECHNOMETRICS 
Post Office Box 587 
Benjamin Franklin Station 
Washington 4, D.C. 


All non-member subscriptions should be mailed to this address. Communications concern- 
ing changes of address, subscriptions, back numbers, etc., should also be sent to this ad- 
dress. Whenever possible, a copy of the address taken from an issue of the periodical 
should accompany a change of address request. All subscription fees are payable in the 
currency of the United States of America. 


Communication concerning ren. in either of the sponsoring societies should be sent 
to that society. 


Second class. postage paid ct Richmond, Va. 


AE A ST LT ON TS LT ALLL 





TECHNOMETRICS 


A Journal of Statistics for the Physical, Chemical and Engineering Sciences 


Published Quarterly by 
THE AMERICAN SOCIETY FOR QUALITY CONTROL 
and the 
AMERICAN STATISTICAL ASSOCIATION 


Editor 


J. Sruart Hunter 


Associate Editors 
WittiaM ALLEN 
G. A. BARNARD 


C. A. BENNETT 


Davin R. Howes 
Frep C. LEONE 


FRANK ProscHAN 
CuTHBert DANIEL 


Besse B. Day 
R. J. Haver 


Leo Tick 
Martin WILK 


Marvin ZELEN 


Management Committee 
Paut S. Ormsteap, Chairman 


For the For the 


American Society for Quality Control American Statistical Association 


Warren R. Purcety Irvinc Burr 


Maynarp RENNER 
H. L. WeHrety 


CHURCHILL EISENHART 


Donatp C, Ritey 


Published Quarterly in February, May, August and November by the 
Technometrics Management Committee of the American Society for Quality 
Control and the American Statistical Association. Editorial Office: Mathematics 
Research Center, U. S. Army; The University of Wisconsin, Madison, Wisconsin. 
Publication Office: Wm. Byrd Press, P. O. Box 2W, Richmond 5, Virginia. Second 
class mailing privileges granted at Richmond, Virginia. 


Composed and Printed at the 


Wityiam Byrp Press, Inc., RicomMonp, Vircinta, U.S.A. 





CONTENTS 


TECHNOMETRICS, Vol. 3, No. 3, AUGUST 1961 


The 2” ” Fractional Factorial Designs 
G. E. P. Box and J. S. Hunter 
Partial Confounding in Fractional Replication. ..W. J. Youden 


Finding New Fractions of Factorial Experimental 
a 5x3 ewe ac db de dn UROL REO RRR R. E. Fry 


A Study of the Group Screening Method G. S. Watson 


Missing Values in Response Surface Designs. .Norman R. Draper 


The Optimum Allocation of Spare Components in 
Systems Donald F. Morrison 


Use of Tables of Percentage Points of Range and 
Studentized Range H. Leon Harter 


The Reliability of Components Exhibiting Cumulative 


Damage Effects George H. Weiss 


An Analysis of Some Relay Failure Data from a 
Composite Exponential Population 


R. R. Prairie and B. Ostle 


Applications of Truncated Distributions in Process Start-ups 
and Inventory Control H. Smith and D. W. Grace 


Estimating the Poisson Parameter from Samples that 


Are Truncated on the Right A Clifford Cohen, Jr. 
Book Reviews M. Stone and M. J. R. Healy 


Statistical Programs for High Speed Computers 


Notices 









Vot. 3, No. 3 





TECHNOMETRICS Aucusrt, 1961 





The 2°” Fractional Factorial Designs* 
Part I. 


G. E. P. Box anp J. S. HUNTER 


Statistics Department, University of Wisconsin and Mathematics Research Center, 
University of Wisconsin 











“Cats is dogs and dogs is dogs and rabbits is dogs, and squirrels in cages is parrots... .” 


1: THe Two-VERSION FACTORIALS AND FRACTIONALS 










A full 2" factorial design requires all combinations of two versions of each of k 
variables. If a variable is continuous, the two versions become the high and 
low level of that variable. If a variable is qualitative the two versions correspond 
to two types, sometimes the presence and absence of the variable. 

The runs comprising the experimental design are conveniently set out in 
either of two notations as illustrated for the eight runs comprising a 2° factorial 
in Table 1. 



















TABLE 1 
Alternative Notations for the 2 Factorial Design 















Notation 1 Notation 2 
Run Variables Variables 
A B:¢ $23 





1 
a 

b 
ab 
c 
ac 
be 
abe 





UL. sete 
| 


| 
++++4+! 





t+i+Ii+Ii1+1 


oOonNoarwnd 
++ 





In the first notation the variables are identified by capital letters, and their 
two versions by the presence or absence of the corresponding lower case letter. 
When all the variables are at their “low’’ level or version a “1’’ is used. In the 
second notation the variables are identified by numbers and the two versions 
of each variable by either a minus and plus sign, or by minus and plus one. 
The experimental design can then be viewed geometrically. A run is represented 
by a point whose coordinates are the +1 versions for that run. For example, 


* Sponsored by the United States Army under Contract No. DA-11-022-ORD-2059. 
311 





312 G. E. P. BOX AND J. S. HUNTER 


the 2° factorial will provide the eight vertices of a cube in a three-dimensional 
coordinate system. The notation using minus and plus signs is used in this 
paper. The list of experimental runs is called the design matrix and is denoted 
by D. For a 2* factorial, the design matrix contains k columus and N = 2* rows. 
There is a column for each of the k variables, and each row gives the combination 
of versions for each run. 

In Table 1 the runs are listed in standard order. The elements of the first 
column are alternate minus and plus signs. The elements of the second column 
are alternate pairs of minus and plus signs, the elements of the third column 
alternate groups of four minus and plus signs and so on. The last column consists 
of 2"~* minus signs followed by 2‘~* plus signs. 


The Estimates 


On the assumptions that the observations are uncorrelated and have equal 
variance, then the 2" factorial designs provide independent minimum variance 
estimates of the grand average and of the 2‘~* effects: 


k main effects, 
k(k — 1) 
2 


k(k — 1)(k — 2) 
2-3 


two-factor interaction effects, 


three-factor interaction effects, 


(1) 


k(k — Nk — 2) «- (k — h — 1) 


hi h-factor interaction effects, 


and finally a single k-factor interaction effect. 

Although the Yates’ Algorithm (1) provides a quick method for calculating 
these estimates, a longer, but more basic calculation technique will now be 
described. In Table 2, where for convenience a 2° design is used, a matrix of 
independent variables X is generated from the design matrix D. In general the 
individual elements for an 7j interaction column in X are obtained by multi- 


TABLE 2 


Design Matrix Matrix of Independent Variables Observations 
x Y 
1 3 1 2 8. 42 323 23 223 


lise he 


(i ot 
RAWANWON 


+i¢ti¢t+ic+i 
+++4+4+4+4+4+ 
+itititi 
+/i+ 1 

+1 






















THE 2*—? FRACTIONAL FACTORIAL DESIGNS 





313 


plying the corresponding elements of the separate i and j columns. Similarly, 
the elements of the ijk interaction column are given by the product of the ele- 
ments of the columns labelled <j and k and so on. The first column of X consists 
entirely of plus signs and is used to provide an estimate of the mean. For a 2" 
design the full matrix of independent variables X contains 2° columns as well 
as 2° rows. The elements of the column Y in Table 2 are the observations re- 
corded at each of the 2* experiments. The estimate of the effect ij --- k is 
obtained by taking the sum of products between the elements of Y and the 


corresponding elements of the column ij --- k and dividing this product by 
N/2 where N = 2‘, eg., 


ij --- k effect = g D yiig. -++. k} (2) 


where {7.j.----. k} stands for the elements of the 7j - - - k column and the summa- 


tion is taken over all N products. Thus, using the data from Table 2 the 1 3 
interaction effect is 


13 = }2-10+8-—-12-—-6+8—6+4+4) = —3.0. 


where here and henceforth numerals appearing in bold type are used to identify 
the main effects and interactions. Solving for all the effects gives: 





Main Effects Two-factor Interactions 


1= 3.0 12 
2= 1.0 13 
3 = —2.0 23 


Three-factor Interaction 










—2.0 
—3.0 
—3.0 


123 = zero 


Each estimated effect has variance 
Variance (effect) = 40°/N (3) 


where o” is the variance of the individual observations. 


The average is obtained by taking the sum of products of the column I with 
the observation column Y and dividing the result by N, thus 


average = 7 = >. y{I}/N. (4) 


Thus 7 = 56/8 = 7.0 with variance o’/N. By this process 2" estimates can be 
obtained from 2‘ runs and when k is large the wealth of such estimates becomes 
almost an embarrassment. However, in many practical situations, the higher 
order interaction effects can often be hopefully supposed to be negligible 
in size. For example, with continuous variables it is reasonable to expect the 
response to vary smoothly. When factorial designs are correctly used to study 
qualitative variables it is because certain aspects of similarity are expected 
in the responses at the different versions. Thus, two solvents and two differently 
shaped particles may with profit be studied in a factorial design when at least 
some aspect of similarity in behavior of these variables might be expected. 

In the conditions of smoothness and similarity commonly encountered, the 
three-factor and multi-factor interaction effects are often negligible. When 
this is the case, fractional designs using a smaller number of runs may be em- 























































































































314 G. E P. BOX AND J. S. HUNTER 


ployed for although in these fractional designs the effects of the major interest 

are confused with higher order effects, nevertheless, the latter are small enough 

to be ignored. In some situations the total number of variables k is large, but 

only a few (say p = 2 or 3) are expected to have any effect. In this situation 

designs which are fractional in the k variables may be chosen which have the 

property that they are complete factorials in any sub-group of p variables. 
For illustration, we first discuss the one half fraction of the 2* design. 


One-half Fraction of the 2* Factorial 


Since the design is to contain 2*"* = 8 runs a 2° factorial design is first written 
down. The — and + elements associated with the 1 2 3 interaction column then 
are used to identify the — and + versions of variable 4. The resulting eight 
combinations shown in Table 3 give a particular half replicate or ‘“fractional”’ 
of the complete 2* design. A (4)” fraction of a 2" factorial design is called a 2‘~” 
fractional, or more exactly, a 2‘~” fractional factorial. The present design is 
therefore a 2*~* fractional. 


TABLE 3 
Constructing the 2‘ Fractional Factorial Design 


Design Matrix Observations 
3 123=4 = 


_ 


8.7 


9.7 
11.3 
14.7 
22.3 
16.1 
22.1 


lL+1+4+1 


; 
: 
: 
a 


a 





With a full 2* design, sixteen effects can be estimated: the grand average, 
four main effects, six two-factor interactions, four three-factor interactions and 
a single four factor interaction. With only eight observations it is clearly im- 
possible to obtain sixteen independent estimates. We note that the combina- 
tion of observations used to estimate the main effect 4 is identical to that used 
to estimate the three-factor interaction effect 1 2 3. The estimates of 4 and 
1 2 3 are said to be confounded. The “‘4” effect really estimates the swm of the 
effects of 4 and 1 2 3. 

Study of Table 3 will show that other estimates such as 1 2 and 3 4 are also con- 
founded. It is desirable to have a general method which enables one to determine 
which effects are confounded. This is accomplished for this design by inducing 
the equality 4 = 1 2 3 where the multiplication product 1 2 3 refers to the multi- 
plication of the individual elements in the corresponding columns 1, 2 and 3. 
Now it is obvious that by multiplying the elements in any column by a column 
of identical elements, a column of plus signs will result. Since a column of plus 
signs corresponds to I we have 1 X 1 = 1° = I and similarly that 2” = I, 3° = I 





















THE 2*—? FRACTIONAL FACTORIAL DESIGNS 





315 


and 4’ = I. This identity supplies the key to the remaining relationships. On 
multiplying both sides of the equation 4 = 1 2 3 by 4 we get 


47=1234 thatis I=1234. (5) 


This identity is readily confirmed for if the elements in columns 1, 2, 3 and 4 
are multiplied together we obtain a column of plus signs, that is I. 

The interaction 1 2 3 4 associated with I is said to be a generator of the design. 
In this particular instance there is only one generator so this provides the defining 
relation I = 1 2 3 4 which is the key to all the relationships which exist between 
the effects. 


Aliases and Linear Combinations of E ffects 


Suppose we wish to know which effect is confounded with the main 
effect 3. Multiplying both sides of the defining relation by 3 gives 3 = 123°4 = 
1214 = 1 2 4 since multiplication by I (a column of plus signs) leaves the 
elements in any column unchanged. Thus, the main effect 3 is confounded with 
the three-factor interaction 1 2 4. Similarly, we find that the two-factor inter- 
action 3 4 is confounded with 1 2 and so on. The quantities so associated are 
called aliases. If we now proceed to estimate the main effect 3 we will in fact 
obtain the sum of the estimates of the main effect 3 and the three-factor inter- 
action 1 2 4. The estimate of 3 is really an estimate of the combination of the 
effects 3 + 12 4. Eight linear combinations of effects ¢; , f, , f, , --- are available. 
Thus ¢, , = } >> y{1} or equally 4, = 3 )> y{23 4}. Similarly ¢,, = 4 > y{1 2} 
or equally ¢,. = 4 >> y{3 4}. Using the defining relation we find that these 
linear combinations estimate the quantities given in Table 4, the subscript on the 
{’s identifying the first effect in the linear combination. 


TABLE 4 
t; 
t, 
f, 
t; 


= average + 1234 t, 
1+234 
2+134 
3+124 


4+123 
12+34 
13+24 
14+23 











£13 
fis 








The variance of these estimates is o”/2. The average 7 has variance o”/8. 
On studying Table 4 we see that the two-factor interactions are mutually 


TABLE 5 


The eight linear combinations of effects from a 24! design 
with defining relationI = 1234 





(; = average’+ 1234 = 15.0 4 =4+123= 08 
4,=14+234= 5. fe =124+34= —-16 
=2+134= -04 43 =134+24= 14 
=3+4+124= 


7.6 fy=144+23= 1.0 


































































































316 G. E. P. BOX AND J. S. HUNTER 


confounded in pairs, but assuming that the three and four factor inter- 
actions are either non-existent or negligible the estimates ¢, , ¢, , f , ¢; and ¢, 
can be taken to be estimates of the average and the main effects 1, 2, 3 and 4. 
If, furthermore, prior knowledge is available that, for example, the 3 4 inter- 
action effect was negligible, then the estimate ¢,. could be taken to estimate 
the 1 2 interaction effect alone. 


The Alternative Fraction 


In the above example, in forming the 2*~’ design, the factor 4 was associated 
with the three-factor interaction 1 2 3. In standard ordering, the elements of 
the three-factor interaction column, and hence of factor 4, are 


-++-4+ - - +. 


The factor 4 can either use these elements as they stand, or it can be associated 
with the negative of the 1 2 3 effect, that is, with the elements 


t= — + -— + +=. : 
In the first case 4 = 123 that isI = 123 4, and in the second case —4 = 123 


that is I = —1234. The designs for these two 2*~’ fractional factorials are given 
in Table 6. The two parts together constitute a complete 2* factorial design. 


TABLE 6 
The Design Matrices for the two 2‘! Fractional Factorials 
with Defining Relations I = 1234andI = —1234. 


Defining Relation Observations Defining Relation Observations 
I=1234 I= -1234 


Table 6 shows a further set of observations associated with the second frac- 
tion. In Table 7 eight linear combinations of effects ¢; , f{ , 4; , --- associated 
with the fraction having defining relation I = —1 2 3 4 are given. If both frac- 
tions are present, then simple addition and subtraction of the ¢ and ¢’ linear 
combinations will provide unconfounded estimates of all the effects. For ex- 
ample, the main effect 1, unconfounded with the 2 3 4 interaction is given by 
4(¢, + &) = 5.6. Similarly, the 2 3 4 interaction unconfounded by the main 
effect 1 is obtained from 3(¢, — £{) = —0.20. The average response, when both 
fractions are present, is given by 3(¢; + ¢;) = 15.6. 

The estimates obtained by taking the sums and differences of the linear 
























THE 2*—? FRACTIONAL FACTORIAL DESIGNS 


TABLE 7 
The eight linear combinations of effects from a 24 design 
with defining relationI = —-1234 


ti = average — 1234 = 16.2 ti =4-—123 = -1.0 
%=-1-—-234= 58 %,=12-—34= 08 
fh =2-—134= —02 %,=13-24= 22 
f=3-124= 78 U,=14-23= 0.6 


combinations computed from the individual fractional factorials are the same 
as would be obtained from an analysis of a full 2* design. 


The } Fractions of the 2" Designs 


Any interaction or main effect can be used to split a full 2* factorial into two 
half fractions. However, given the assumptions that the higher the order of 
the interaction the less likely the effect is to occur, there is clearly an advantage 
in using the interaction of highest order to make the split. The generator is then 
123 --- k and the defining relationI = 123 --- k. 

The 3 fractions of all the 2‘ factorial designs are best obtained by first writing 
down the design matrix for a full 2~* factorial and then adding the kth variable 
by identifying its + and — versions with the + and — signs of the highest 
order interaction 1 2 3 --- (k — 1). Thus the 2°” factorial is constructed by 
writing down the design matrix for the 2° factorial and then equating variable 
3 with the 1 2 interaction. Similarly, the 2°~* factorial is given by writing down 
the sixteen runs of the 2* and then equating the signs of variable 5 with the 
signs of the 1 2 3 4 interaction. The defining relations for these } replicate designs 
are thus 






Design Defining Relations 
er I=123 

_ I=1234 (6) 
> I=12345 


The extension to the half-replicate designs for k > 5 is obvious. However, 
for k > 5 these half-replicate designs permit the estimation of a plethora of 
linear combinations of effects, many of which are combinations of higher order 
interactions solely. We are therefore interested in still smaller fractions of the 
2 designs, that is, in the 2‘~” fractional factorials for p > 1. For such designs 
there is not one, but p generators which combine to provide the defining rela- 
tion. Before discussing these designs, it is profitable first to discuss their areas 
of application. 





2: AREAS OF APPLICATION 


Fractional designs are of value in a number of different circumstances: 


1) where certain interactions can be assumed non-existent from prior 
knowledge, 























































































































318 G. E. P. BOX AND J. S. HUNTER 


2) in “screening” situations where it is expected that the effects of all but 
a few of the variables studied will be negligible, 

3) where groups of experiments are run in sequence and ambiguities re- 
maining at a given stage of experimentation can be resolved by later 
groups of experiments, 

4) where certain variables, which may interact, are to be studied simul- 
taneously with other variables whose influence, if any, can be described 
by main effects only. 


Some Interactions Non-Existent, A Priori 


As already noted, when properties of smoothness and similarity exist, inter- 
actions between three or more variables are often negligible. In addition, the 
physical nature of a problem is sometimes such that certain interactions must 
be small or non-existent. In these circumstances we can then use arrangements 
in which the effects expected to be real are confounded only with interactions 
expected to be negligible. For example, in Table 3 the estimate of the 1 2 3 
interaction effect is perfectly confounded with the main effect 4. Under the 
assumption that the three-factor interaction is small, the estimate can be taken 
as the main effect of 4 alone. 

In most practical situations, to say that we assume, a priori, that certain 
effects are negligible would be too strong. Frequently, limitations of time and 
money do not allow the luxury of the certainty obtainable from exploring an 
entirely comprehensive model which allows for every contingency. We tentatively 
entertain the possibility of negligible interactions and try to check assumptions 
as the evidence unfolds. 


Screening Situations 


Situations often occur where not very much is known about the variables 
that influence some response. Any subset of a large number of variables might 
be important, but which variables form this subset is unknown. Although 
usually the number of variables under study will be greater than four, the 
application of fractionals to this situation can be illustrated with the 2*~* design 
given in Table 3. It will be seen that if any one variable out of the four produces 
a large effect, then no matter which variable it is, the design may be regarded 
as a 2° factorial replicated four times in the important variable. If any two 
variables are producing large effects, the design becomes a full 2’ factorial 
replicated twice in these variables. If any three variables are producing large 
effects, again the design becomes a full 2° factorial in these variables. Fractionals 
for use in screening situations which are replicated factorials for any number 
up to three variables out of sixteen can be obtained using only thirty-two runs. 
For picking out the two or three important variables from among a large group 
of variables, these designs are very useful. 


Sequential Groups of Experiments 


Fractional factorials are of considerable value in the common situation where 
experiments are performed in sequence. Having performed one fraction, the 
results can be reviewed and where there is ambiguity due to the confounding 





THE 2*—P FRACTIONAL FACTORIAL DESIGNS 319 


of particular estimates, or experimental error, a further group of experiments 
can be selected to resolve the uncertainty. 


Simultaneous Study of “ Major” and “ Minor’’ Variables 


It sometimes happens that there exists a group of “‘major’’ variables whose 
study is the chief objective of the investigation. In addition there may be a 
number of “minor” variables which are expected to have negligible effects. 
Fractional designs are available in which both kinds of variables are included 
simultaneously, the main effects and interactions of the major variables esti- 
mated without bias, and the main effects of the minor variables checked. The 
assumption made is that interactions between the minor variables will be 
negligible. 

3: SpecraL Types or 2" Facroriats 


Fractional factorial designs can, for convenience, be divided into types. In 
general the higher the degree of fractionation the more comprehensive the 
assumptions needed to make unequivocal interpretation possible. The following 
three types of designs are discussed: 


(i) Designs of Resolution III in which no main effect is confounded with 
any other main effect, but main effects are confounded with two-factor 
interactions and two-factor interactions with one another. The 2°** 
design is of Resolution ITI. 

Designs of Resolution IV in which no main effect is confounded with 
any other main effect or two-factor interaction, but where two-factor 
interactions are confounded with one another. The 2** design is of 
Resolution IV. 

Designs of Resolution V in which no main effect or two-factor inter- 
action is confounded with any other main effect or two-factor interaction 
but two factor interactions are confounded with three factor interactions. 
The 2°~* design is of Resolution V. 


In general, a design of resolution R is one in which no p factor effect is confounded 
with any other effect containing less than R — p factors. 

To identify the resolution of a fractional factorial design, the appropriate 
Roman numeral subscript is used. Thus, rewriting Equation (6) along with the 
defining relations for both one-half functions we have 

Design Defining Relations 
it I= +123 
s I= +1234 (7) 
ay I= +12345 


In the above a word refers to a combination of elements such as 1 2 3, 1 2 3 4. 
In general the resolution of a design is equal to the smallest number of characters 
in any word appearing in the defining relation. 


4: Reso.ution III Desiens 


Designs of resolution III are available which require only N runs to study up 
to N — 1 variables, where N is a multiple of four. We first discuss the arrange- 





320 G. E. P. BOX AND J. S. HUNTER 


ments for which N is a power of two. Particularly important designs are those 
for testing three variables in four runs, seven variables in eight runs and fifteen 
in sixteen runs. Two level designs for studying eleven variables in twelve runs, 
nineteen variables in twenty runs, etc., are derived by a somewhat different 
method due to Plackett & Burman (6), and are described later. 

Designs for studying k = N — 1 variables in N runs may be called saturated 
designs. We introduce these designs by first considering a fractional for testing 
k = 7 variables in N = 8 runs. The complete factorial would require 2” = 128 
runs. We are considering therefore a one-sixteenth (i.e., a 2~*) fractional, that 
is, a 277; design. Since the design uses 2° = 8 runs, we start construction of the 
design matrix with the 2° factorial, and then associate four additional variables 
with the plus and minus signs of the four interaction columns. For example, 
we may set 


4=12, 5 = 13, 6 = 23, 7=123 (8) 
to obtain the following 277} design 


TABLE 8 
The Design Matrix for a 2777} Design 


3 4=12 5=13 6=23 7=123 


Hie ota eal eles 
sot eed 


z 
; 
; 


+itiiti+ 
++i brite+ 


Soricrer ) tw 


The identifications in Equation (8) provide the generating relations 
I=124, I=135, I=236, I=1237 (9) 


associated with the generators 1 2 4, 1 3 5, 23 6 and 1 2 3 7. Now clearly, if 
I= 124andI = 135 thenalsooI = 124X135=1°2345 =2345. 
Whence it follows for example that 2 3 and 45 are confounded. Thus, when there 
is more than one generator, the defining relation must contain not only the rela- 
tions provided by the generators themselves, but all those obtained from all their 
possible products. The complete defining relation for this 27;{ design is obtained, 
for example, by taking the generators first one at a time and then multiplying 
them together in all possible ways. Taking them one at a time givesI = 124 = 
135 = 236 = 1237. Multiplying them together two at a time gives 


I=2345=1346=347=1256=257 =167, 
three at a time gives: | 


I=456=1457=2467 =3567, 











THE 2*—P FRACTIONAL FACTORIAL DESIGNS 


and finally, four at a time gives: 
I=1234567. 








The complete defining relation for this 27;{ design is therefore 
I 



















124=135=236=1237=2345 =1346 =347 
1256=257=167=456=1457 =2467 =3567 (10) 
1234567. 









As before, the defining relation quickly provides the alias structure for any 
effect, that is, indicates which effects are confounded. For example, multiplying 
the defining relation through by 1 we obtain 


1=24=35=1236=237=12345 =346=1347=256 
=1257=67=1456=457 =12467 =13567 =234567. 





Thus the interactions 2 4, 3 5, 1 2 3 6 etc., are seen to be aliases of, or confounded 
with, the main effect 1. Similarly, multiplying through by 1 2 3 we obtain 


123 =34=25=16=7=145 =246=1247 
=356=1357=2367=123456=23457 
= 13467=12567 = 4567. 











Thus the three-factor interaction 1 2 3 is an alias of, or confounded with 3 4, 
2 5, 16, etc. Since the resolution is determined by the smallest number of 
symbols forming any word in the defining relation, the design is of resolution 
III, as we have already noted. 

In this example, if we write 4; = 4 >> y{1}, & = 4 > y{2}, ete, and if we 
assume that all interactions between three of more variables sre negligible, 
then by repeated use of the defining relation we obtain: 







{, = average 

4,=1+24+35+67 
f=2+14+36+57 
f,=3+15+26+47 
4&=4+12+56+37 
f,=5+13+46+27 
4,=6+23+45+17 
4=7+34+25+16. 




























The Alternative Fractions 
7-4 
I 


In writing down the design matrix for the 2‘; fractional, the variables 4, 5, 6 
and 7 were identified positively with the elements of the interactions 1 2, 1 3, 


322 G. E. P. BOX AND J. S. HUNTER 


2 3 and 1 2 3 respectively. However, each of these identifications could have 
been made with either a plus or minus sign. For example, instead of associating 
the variable 4 positively with the interaction 1 2, that is taking 


+--+ +--+ 


for its-elements, the variable 4 could be associated negatively with the elements 
of 1 2, that is: 


-++--+4+ 4+ -. 


The first association gives 4 = 1 2 or equivalently I = 1 2 4. The second associa- 
tion yields 4 = —1 2 or equivalently I = —1 2 4. We could, in fact, have used 


any one of the sixteen identifications corresponding to the sixteen possible 
choices of signs 


I=+124, I=+135, I=+236, I=+1237. (12) 


The sixteen possible identifications give the sixteen individual fractions which 
together yield the complete 2’ design. In composing the defining relation for 
any one of the sixteen designs the usual rules of algebraic multiplication determine 
the signs in the defining relation and hence in the alias pattern. 

Another one of these sixteen fractions is, for example, that in which variables 
5 and 6 are associated with the elements of the interaction vectors 1 3 and 2 3 
taken negatively, The generators for this design are: 


124, —135, —236, 1237, (13) 
and the corresponding defining relation is: 


Il=124= -135 = —-236=1237 = —-2345 = -1346 =347 
1256 = —257 = —-167 = 456 = —-1457 = -—-2467 
3567=1234567. 


Assuming as before that all interactions between three or more variables are 


negligible, we see that this fraction allows the estimation of eight somewhat 
different combinations of effects 


ft average 

i 1+24-—35-67 
2+14-—-36-57 
3-—-15-—-26+47 
4+12+56+37 
—-§+13-—-46+27 
—-6+23-—-45+17 
7+34-—25-—16 


a weet «& 4 & @ 












THE 2*—P FRACTIONAL FACTORIAL DESIGNS 





323 


where the use of the prime notation on the ¢’s indicates only that some alterna- 
tive function is under consideration. We see that Eq. (14) is identical to Eq. (11) 
with the numerals 5 and 6 having minus instead of plus signs. 


Families of Fractionals 


In the above example, there are 2‘ = 16 different 27;{ designs, each design 
corresponding to a particular choice of signs from among the generators 
+12 4, +135, +2 3 6 and +1 2 3 7. When the generators of a fractional 
factorial design associated with the identity I all have positive signs, they are 
called the principal generators. The defining relation obtained by multiplying 
out the generators is similarly called the principal defining relation, and the 
corresponding fractional factorial the principal fraction. Individual member 
fractions obtained from changes of sign in the generators are said to belong to 
the same family. In general, a 2‘~” fractional factorial design will have p gener- 
ators, and the 2” ways of allocating plus and minus signs to the generators will 
produce the 2” different fractions belonging to the same family. 

In general, a 2°~’ design will have f independent generators G, , G. , --- , G;. 
An independent generator is such that it cannot be obtained by multiplying 
together the other generators, and is identified by the original association 
adopted in writing down the design. A defining relation for a particular fraction 
will contain 2’ words obtained by multiplying out (I + G,)(I+ G.) --- I+ G,). 
The 2’ different fractions have defining relations given by the 2’ different ways 
of allocating plus and minus signs in this product. The defining relation for the 
principal fraction is given when all signs are plus. The alias pattern for any of 
the non-principal fractions is simply obtained by making the appropriate changes 
of sign in the alias pattern for the principal fraction. 


Resolution III Designs Containing 16 and 32 runs. 


The principal fraction of the 2}{;"' design is obtained by first writing down 
the sixteen runs of the complete 2* design and then associating an additional 
eleven variables with the interactions 1 2, 13, 1 4,2 3,2 4,3 4,123,124, 
13 4, 23 4, and 1 2 3 4. Similarly, the thirty-two runs comprising the 21{;”° 
factorial are obtained by writing down the complete factorial for five variables 
and then equating the additional twenty-six new variables with their inter- 
actions between the original five variables. 


Effect of Dropping Variables 


For intermediate values of k resolution III designs may be obtained by omitting 
variables from the resolution III design of next higher order. For example, 
to test six factors in eight runs we can use the 27;; design dropping out any 
one column in its design matrix. The alias relationships remain the same except 
that all words containing the characters associated with the dropped variables 
are omitted from the alias structure, and from any estimates of linear combina- 
tions. For example, dropping the columns 3 and 5 from the design matrix for 
the 277 fractional given in Table 8 yields the 2°; design shown in Table 9. 
We can select the variables to be dropped out so that the most satisfactory 
alias arrangements exist among those remaining. 
























































































































G. E. P. BOX AND J. S. HUNTER 
TABLE 9 
Design Matriz 237? 
Defining RelationI = 124 =167 = 2467. 


home es 


Fitititt 


4 


Although it is true that a fractional of resolution R in a reduced number 
of k — d variables can always be obtained by omitting d variables from a k 
variable fractional of resolution R, nevertheless a particular design obtained 
in this manner does not necessarily provide the best arrangement possible. 
For instance, if we drop variables 3, 5, 6 and 7 from the principal fraction 277; 
design with generators 1 2 4,135, 236and 123 7 we are left with a design 
in the three variables 1, 2 and 4 along with the unresolved generator 1 2 4 and 
hence the defining relation appropriate to a design having only four runs. On 
inspection we find that our eight factor combinations in the three remaining 
variables consist of two replications of the four run half-replicate design defined 
by I = 12 4. This design is of resolution III, of course, but in many cases we 
would prefer to use the eight runs to perform a full factorial in the variables 
1, 2 and 4. A full factorial would have been obtained had we, for example, 
dropped variables 1, 2, 3 and 7. 

The defining relation for the design obtained after dropping d variables will 
contain all those words in the original defining relation which do not contain 
any of the dropped numerals. Suppose among the f generators of the original 
design there are d generators that contain dropped variables, and f — d generators 
that do not. A set of generators for the derived design will contain all the f — d 
generators not containing dropped variables together with the largest set of 
independent products not containing dropped variables which can be found by 
multiplying the remaining generators. 

For example consider again the resolution III design with generators 
G, = 124,G, = 135, G,; = 236 and G, = 1 23 7. Suppose variable 1 is 
dropped. Since G,; = 2 3 6 does not contain 1 this generator will be included in 
the generators for the derived design. From the remaining generators we can 
obtain the products G,G, , G,G, and G,G, none of which contain the dropped 
variable 1. Only two of the three products may be used since, having taken two 
of them, the third may be obtained by multiplication. For example, 
G,G,-G,G, = G,G, . In general, a group of p words (such as the products we 
are considering here) are said to be independent if no one of them can be obtained 












THE 2k—P? FRACTIONAL FACTORIAL DESIGNS 





325 


by multiplying together some subset of the remaining p — 1. In this example 
then, a set of generators for the design derived after dropping 1 are 23 6,2345 
and 3 47 (that is, G, , G,G, and G,G,). 

At best, the effect of dropping d variables is to produce a design having d 
fewer generators. However, this represents the maximum reduction in generators 
possible, and particular choices of dropped variables may produce a smaller 
reduction in the number of generators. Of course, the greater the number of 
generators, the more words there will be in the defining relation and corre- 
spondingly, the more aliases for the remaining effects. 


Effect of Combining Fractions from the Same Family 


If we take the original fraction of the 24;{ together with the second fraction 
in which the signs of 5 and 6 are switched, and take one-half the sums and differ- 
ences of the respective linear combinations of effects we can estimate the follow- 
ing quantities (assuming all interactions with more than two factors to be nil). 


From 3 the Sums From 3 the Differences 
4(é, + ¢{) = Grand average 3(¢, — t}) = Block effect 
}4,+4) =1+24 (4, —%) =35+67 
34+) =2+14 (4. — ) = 36+57 
14+ 4%) =3+47 3(4 — %) =15+26 (15) 
44,+ 4%) =44+124+56+37 3(¢, — {) = higher order interactions 
40, + 4%) =134+27 3(f; — 4) =5+46 
f+ %) =23+17 3( — &) =6+45 
14+ %) =7+34 (4 — 4%) =25+16 


In general when two fractions from the same family are combined, the sums 
and differences of the corresponding linear combinations of the effects determine 
the effects which can be estimated from the combined design. The “block effect’’ 
referred to in Eq. (15) is the difference in average level between the first and 
second groups of eight runs. 


Combining Fractional Factorials to Separate Effects 


The procedure of adding fractions in sequence with suitably switched signs 
provides a useful method for the systematic isolation and confirmation of 
important effects in multi-variable systems. The method is very flexible and 
can be used in different ways as different situations unfold. 

Mention will be made of two particular uses of this device: (1) the addition 
of a second fraction in which the signs in a single column are switched and, (2) 
the addition of a second fraction in which the signs in all the columns are switched. 


Switching Signs for a Single Variable 


Suppose a fractional factorial is generated by switching the signs associated 
with only the variable 1 in the 27;{ factorial given in Table 8. Then the linear 
combinations that can be estimated from this fraction (given that the three- 
factor and higher order interactions are negligible) are the following 




































































































































G. E. P. BOX AND J. S. HUNTER 


f, = Average 

f, = -1+24+35+67 
t, 2-—14+36+57 
f; 3-15+26+47 
& 4-12+56+37 
f= §-13+46+27 
f, 6+23+45-17 
4= 7+34+25-16 


Combining this fraction with the principal fraction, the following linear com- 
bination of effects are obtained from the combined design 


From 3(¢ + ¢’) From 3(¢ — ¢’) 


Average Block effect 
24+35+67 1 
2+36+57 14 
3+26+47 is 
4+56+37 12 
§+46+27 13 
6+23+45 17 
7+344+25 16 
We see that by adding to a fraction a further fraction with the signs for a single 
variable reversed, we isolate the main effect of that variable together with all 
of its two-factor interactions. Given any fractional of resolution III or higher 
and a second fractional identical to the first except that the signs of a single 
variable are switched, then the combined design will provide estimates of the 
main effect of the switched variable and all its associated two-factor interactions 
unbiased by any other main effect or two-factor interaction. 


Switching Signs for All Variables 


By switching signs for all seven variables given in the principal fraction we 
can estimate the following linear combinations 


ti = Average 

f= -1+24+35+67 
f= —2+14+36+57 
f= -3+15+26+47 
(= —-4+12+56+37 
tf —-§+13+46+27 
= -6+23+45+17 
= -7+34+25+16 


By combining this fraction with the principal fraction all the main effects can 






















THE 2*—? FRACTIONAL FACTORIAL DESIGNS 327 


be estimated clear of all the two-factor interactions. The two-factor interactions 


in turn will associate themselves in groups of three in accordance with the 
following scheme 








From 3(¢ + ¢’) From 3(¢ — ¢’) 
Average Block effect 
24+35+67 
14+36+57 
15+26+47 
12+56+37 
13+46+27 
23+45+17 
34+25+16 








(19) 


ao ua kt WO NS 






7 






















This is a special example of a general principle, [14], which states that if any 
fractional is replicated with reversed signs, then all alias links between main 
effects and two-factor interactions are broken. 

It should be noticed that although there are 2’ = 128 ways of switching 
signs, there are only 2* = 16 of these switches that result in different designs. 
This must be so since there are only 2* different 2’~* fractions belonging to the 
same family. It is easily confirmed by actual trial that the same design can be 
produced by a number of alternative sign switching arrangements, although 
the order in which the experimental runs appear may be different. The situa- 
tion is made clear by considering only the generating relations for the principal 
fraction of the 2771, that is: 


I=124, I=135, 1=236, 1=1237. 


It will be obvious for example that switching the signs of variables 4, 5, 7, or of 


variable 1, produces exactly the same effect. In each case the generating relations 
are: 





I = -124, 





I= -135, I=236, I= -1237 


Generators for Aggregate Designs 





Suppose the principal fraction of the 2%;} given in Table 8 is run. The 
generating relations for this design are 





I, = 124, I, = 135, I, = 236, I=1237 











where the notation I, refers to a column of eight plus signs. Now suppose we 
perform a further series using a second 277; from the same family as, for example, 
the fraction in which the variable 1 is run with reversed signs. The combined 
design formed from the two pieces is now a 2’~* factorial. Since it is a one- 


eight replicate, it will have three generators, not four. How can these generators 


328 G. E. P. BOX AND J. S. HUNTER 

be identified? We note now that the generators for the second fraction are 
I, = —124, I, = —135, I, = 236, I, = —1237. 

It is clear that the generator 2 3 6 must be one of the generators for the combined 

design for in both pieces of the design I, = 2 3 6. Consequently if I,, represents 

the column of sixteen plus signs associated with the complete design, then 

alsol,, = 23 6. ; 

In asking what are the generating relations for the complete design we must 
first ask the question, ‘For which combinations are the products of the ele- 
ments everywhere equal to I,. ?”” Now we observe that 1 2 3 7 has the value 
I, in the first set of eight runs, and —I, in the second set. Thus, 1 2 3 7 is not 
equal to I,, and is therefore not a generator of the combined design. Similarly, 


1 2 4 and 1 3 5 also are not generators for the complete design. 
Now clearly for the first part of the design 


I, = (12 4)(135) = 2345, 
and also for the second part 
s = (-124)(-135) = 2345. 
Thus it is true for the complete design that 
Ie = 2345. 
Similarly, multiplying 1 2 4 by 1 2 3 7 it is true for the complete design that 
Ie = 347. 
A third product is possible, obtained by multiplying 1 3 5 by 1 2 3 7 to give 
Ie = 257. 


Now (2 3 45)(3 47) = 257 and since it is a property of generators that no 
individual generator can be obtained from the others, we include in the new 
set of generators any two of the three derived above. Thus, the generating 
relations for this 2’~* design are 


Ie = 236, Ie = 2345, Ig = 347 
and the corresponding defining relation is 
Ie =236=2345 = 347 = 456 =2467 =257 =3567 


From the above it will be seen that a general rule for finding generators for 
a design derived from two fractions from the same family each defined by 
generating relations of the kind 


Fraction 1:7] = +A = +B=:3C=--:- 
Fraction 2:] = +A = +B=+3C=-:--- 


is as follows: 


Suppose there are U words of unlike sign and L words of like sign in 
the two identities. Then U + L — 1 words which are generators of the 













THE 2*—? FRACTIONAL FACTORIAL DESIGNS 329 


new design will contain the L words of like sign together with U — 1 
words obtained as independent even products of the U words of unlike 
sign. 


In the above an even product is a product between an even number of words 
(usually two). This rule can be applied quite generally not only for combining 
designs of resolution III, but for combining any pair of fractionals belonging to 


the same family. As a further example, suppose two 277; fractions were combined 
with generating relations: 


I= -—124, I= —-135, I= 236, I= -1237 
and 











I= -124, 





I=135, I= —236, I=1237 


(The first fraction can be obtained from the principal fraction of the 2771 by 
reversing the sign of variable 1, the second fraction by reversing the signs of 
variables 1 and 3.) Then the generators for the complete design are —1 2 4 
and any two of the three words obtained from the even products of —1 3 5, 
236and —1237 to give the generating relations: 


I= —-124, I= -1256, I=257. 


The reader will notice that switching to an alternative set of permissable gener- 
ators leaves the design unchanged for it produces the same defining relation. 
Thus, in the above, if we had used the generators —1 2 4, -1256and —167 
the defining relation obtained by multiplying out these generators would have 
been identical to that obtained before. 


Alternative Choice of Generators 


A particular fractional has an unique defining relation for a given design. 
There are however a number of different but equivalent choices of generators 
all of which lead to the same defining relation and the same design. Therefore, 
although we may speak of the defining relation for a design, we should properly 
refer to a choice of generators. In general, suppose G, , G, , --- , G, are a set of 
generators, necessarily independent, for a particular design. Then any other set 
of f independent generators derived by multiplication will be equivalent and will 
produce the same defining relation and be associated with the same design. The 
generators satisfy the same rules of multiplication as before, that is, 
G? = G3 = --- = G = I. To see that this is so suppose that G, = 1 2 3, then 
G? = 1°2°3? = I. If we have four generators G, , G, , G; and G, for a particular 
design, then G,G, , G,G; , G,G, and G,G,G; will be an alternative set of genera- 
tors, but G,G, , G,G, , G,G, , and G,G,G,G, will not since G,G,-G,G,-G,G, = 
G,G.G,G, . In particular, suppose we are interested in the fully saturated 
2i design with generators G, = 124,G, = 135,G, = 236and G, = 1237, 
then the first legimate alternative set of generators will give 2 3 45, 1 3 4 6, 
3 47 and 4 5 6 whereas the second “illigimate” choice gives 2 3 4 5, 1 3 4 6, 
347 and 1234567, for it is readily confirmed that the last generator is the 
product of the first three. 










































































































































330 G. E. P. BOX AND J. S. HUNTER 


Combining Fractionals Not of the Same Family 


We have seen how by switching signs, fractional factorials may be com- 
bined together to isolate particular effects of interest, and that when fractional 
designs have the same generators except for their signs they are classified as 
being from the same family. Another method for isolating effects that is often 
of value is to combine fractions which are not of the same family. In one interest- 
ing species the numbers are switched in the generators as well as the signs. Possi- 
bilities arising from designs of this sort are presently being investigated. 


Blocking Designs of Resolution III 


Frequently an experimenter may fear that his results may be upset by shifts 
in average performance that occur from day to day, or with different batches 
of raw material. Such systematic sources of variation can often be successfully 
eliminated without biasing the estimates of the effects, or inflating the error 
variance by grouping the runs into “blocks’’. 

The resolution III designs can be broken into two blocks of equal size by 
identifying the two blocks with the + and — versions of a single variable. 
For example, using the principal fraction of the 27]; design with generators 1 2 4, 
135,236 and 1 2 3 7 and using variable 7 for blocking we have the design 


given in Table 10. This design is a 2°71 in blocks of four runs each. The generators 


TABLE 10 








6 


+ 


ae eed spel 


Block 1 


b+++4+1 
b+h++it+i 


Block 2 


ora, eo 


for the design can now be written 
124, Ps 5; 236 and 123B (20) 


where the letter B replaces the numeral 7 in the last generator to indicate the 
blocking variable. Assuming that three factor and higher order interactions are 
negligible, the defining relation for this design shows that the six main effects 
1, 2, 3, --+ , 6 are each confounded with three two-factor interactions, one of 
which is a two-factor interaction with the blocks. The block effect itself is con- 
founded with three two-factor interactions among the variables. In general, 
any resolution III design can be broken into two blocks of equal size by selecting 
the + and — signs of any one of the variables in the design matrix to identify 
the two blocks. 

Resolution III designs can be broken into four blocks of equal size by identify- 
ing two block variables B, and B, with the + and — versions of two of the 





THE 2*—? FRACTIONAL FACTORIAL DESIGNS 


TABLE 11 








Run Variables Blocking Variables Block Variables 
Number 1 = B; B, 2 3 6 = Bi 7= B, B; B. B; B, 


oe 


+ + + + + 


variables. For example, starting with the principal fraction of the 277} design 
and using variables 6 and 7 for blocking we obtain the four blocks of two runs 
each as illustrated in Table 11. Among these four blocks there are three degrees 
of freedom associated with the main effects and the two-factor interaction 


of pseudo-block variables B, , B, identified with the two-way table 
B, 
- - 


— 1,2 3,4 
B, 


+ 5,6 7,8 


The pairs of numbers in the cells of the table denote the runs comprising the 
four individual blocks. The “interaction variable’ B, B, has precisely the same 
importance as the main effects B, and B, . We see that on associating B, 
with variable 6 and B, with variable 7 we automatically associate a comparison 
between blocks, that is, the interaction B, B, with the interaction 6 7. In this 
particular example 1 = 6 7 and hence the plus and minus signs of column 1 
are now no longer available to accommodate an experimental variable. The 
variable 1, therefore, is dropped from the experimental design. Thus, using 
variables 6, 7 and 6 7 to identify the four blocks we obtain the design in the 
variables 2, 3, 4 and 5 in four blocks of two, as shown in Table 12. 

It should be noted here that the two runs comprising each block are “mirror 
images” of one another, that is, within a block the versions of one run are exactly 
reversed in the second run. We will later see that this attribute of blocks of 
size two has important consequences. 

It is usually assumed that block variables corresponding to such characteristics 
as the time of day, batches of raw material, operators, etc., do not interact with 

































G. E. P. BOX AND J. S. HUNTER 


TABLE 12 


+ 
The 2};" in four 
~ = + +r 
+ + _ ~ blocks of two runs 
+ - - + each 
. + a 


the experimental variables. As always it is wise to regard this as a supposition 
to be tentatively entertained. (Unchecked assumptions are never safe in an 
applied subject.) We can be reminded of our supposition by setting out the 
analysis as if the supposition were not true, that is, by taking B, , B, and B,B, 
as if they were capable of interacting with the variables. If we treat B, , B, 
and B,B, on the same basis as the experimental variables, we have for the 
generators of the design given in Table 12: 


BB, 2 4, BB, 3 5, B, 23 (21) 


As acianana above, the “interaction” B,B, between the psuedo block factors 
B, and B, represents a contrast between the blocks which is on exactly the same 
footing as B, or B, . (A mere relabeling of the blocks could change the “‘inter- 
action” contrast to a main effect contrast B, .) Consequently, the combination 
B,B, must be treated as a group having the same status as a single variable. In 
particular, a word such as B,B, 1 must count as a two-factor interaction (between 
variable 1 and one of the block contrasts) and not as a three-factor interaction. 

The generators given in Equation 21 can be used to construct the defining 
relation for the design. Assuming all three-factor and higher order interactions 
negligible we obtain the linear combinations of effects given in Table 13. 

The bracketed values in Table 13 indicate the two-factor interactions which 
could, if they existed, bias the various effects. Usually of course, all interactions 
with blocks can be safely supposed to be negligible. On this assumption the 


TaBLe 13 
{, = Average 
f, = BB, + (24 +35) 
f,= 2 +(B,B4 + B,3 + B.S) Linear Combinations of 
f;= 3 +(B,B.5 + B,2 + BA) Effects Provided by 
£, = 4 + (B,B.2 + B;5 + B.3) 2t;" in Four Blocks 
f;= 5 +(B,B.3 + B:4 + B.2) of Two Runs Each. 


f,= Bi, +(23+45) 
f,= B,+(34+25) 








THE 2k—P FRACTIONAL FACTORIAL DESIGNS 333 


main effects of the variables 2, 3, 4 and 5 in this design are clear of two-factor 
interactions and the design given in Table 12 is in fact of resolution IV, that 
is, a 2t;’ fractional in four blocks of two runs each. 


The Plackett and Burman Designs 


The methods given here allow us to construct resolution III designs suitable 
for exploring k = 3 variables in N = 4 runs, k = 7inN = 8,k = 15inN = 16 
and k = 31 in N = 32 runs. It was pointed out by Plackett and Burman in 
1946 [6] that two version designs which gave uncorrelated estimates of first 
order effects were available for exploring k = N — 1 variables in N runs where 
N was any multiple of four, and they presented the design matrices for these 
designs for 4 up to 100 (except for the isolated case of N = 92). When N isa 
power of two, the designs provided by Plackett and Burman are identical with 
one or the other of the families of resolution III designs derived by the methods 
given above. For the cases N = 12, 20, 24, 28 and 36 however, the Plackett and 
Burman designs allow useful gaps to be filled and are presented below. 

The rows of plus and minus signs given in Table 14A are used to 
construct the design matrices for N = 12, 20, 24 and 36 while the design 
matrix for N = 28 is constructed from the nine rows shows in Table 14B. 





TaBLe 14A 

k=11 N=12 ++-+++---+- 
k=19 N=20 ++-—--4++4+-4+-+4+-4+----4+4+- 
k=23 N=24 +4+4+44+-4+-44+--4¢+4+--+-+---- 
k=35 N=36 —+-4+44---44444-444--4----4+-4+-+4+--+- 

TABLE 14B 

k =27 N = 28 

A B C 

+-—-++4+4+4+---—- —p- HK = SH — + ++ —+—-++—- + 
+t+t=—+t+4t+4--—- -~=<-+t+--t-- —~—+¢++4+-7+4+- 
-~-++4+4+4++--- t-—---—-+--4t- t+-—t-+7t- ++ 
—-—<—-=<—+=-++4+ + —-=—+F+-—-+--- + P= FE — he + 
= = + ~oe ee --—- FF - Fr -—--—- Ft tt 
—<<-=-— +7 7 + + —-=$-—-=—-- F=- —-++4+-+-++¢ 
+++—--—--+-+ ——F-—=—-+-+-—- -— + += ++ FM 
Fete ee tt — rH oe Se = + ++t—+ttF-- ++ 
+++—----++ =—+=-- -—-+-- —~++—-—-4+++4+-- 





To construct the designs for N = 12, 20, 24 and 36 the plus and minus signs 
appearing in the appropriate row of Table 14A are first written down as a 
column. A second column is obtained from the first by moving down the elements 
of the first column once, and placing the last element in first position. This 
procedure is then repeated, moving down the second column one element to 





334 G. E. P. BOX AND J. S. HUNTER 


produce the third, and so on until all columns are obtained. Finally a row of 
minus signs is added to complete the design. Thus, for the case of k = 11 variables 


TABLE 14C 


a 
[<) 
w 
> 


1 Se 7 


| 

11 

Jeet ee 
ee 
Pater ee 
3 


I+++14++ 
I+++14+4+ 
b++tit¢4+i+ 
I+++1t+4+14+ 
I+++1t+4+14+ 
b++t+it¢+i+i 
b++i¢tit+i 
t+ittiti 
I++! 
b+++1+ 


ae Ree 
ee dhs 4 

| 

| 


b++14i 
l++14+1 
I+i+i 


in N = 12 runs, the design of Table 14C is obtained. To construct the design for 
k = 27, N = 28 the three blocks, A, B and C illustrated in Table 14B are written 
down cyclically 


ABC 
C AS 
BCHo#A 


and these twenty-seven rows followed by a row of minus signs. 


An Example 


In the start up of a new manufacturing unit considerable difficulty was ex- 
perienced at the filtration stage. Other similar units operated satisfactorily at 
other sites but this particular new unit, although apparently similar in most 
major respects to the other units, gave a crude product which required very 
much longer filtration times. A meeting was called to discuss possible explana- 
tions and to consider ways of curing the trouble. The following variables were 
proposed as being possibly responsible. 

(1) The water supply: The new plant used piped water from the local mu- 
nicipal reservoir. An alternative but somewhat limited supply of water was 
available from a local well. It was proposed that the effect of changing to the 
well water should be tried since it was argued that the well water corresponded 
more closely to the water used at other sites. 

(2) Raw Material: The raw material used was manufactured on the site 
and it was suggested that this might be in some way deficient. It was proposed 
that raw material which had been satisfactorily used in the manufacturing of 
the product at another site should be shipped in and tested locally. 

(3) Temperature of Filtration: This was not thought to be a critical factor 
over the range involved and no special attempt to control this temperature 





THE 2*—P FRACTIONAL FACTORIAL DESIGNS 335 


had been made. However, the physical arrangement of the new process was 
such that filtration was accomplished at a somewhat lower temperature than 
had been experienced at other plents. By temporarily covering pipes and equip- 
ment, provision could be made to raise the temperature to the level experienced 
elsewhere. 

(4) Hold up Time: Prior to filtration the product was held in a stirred tank. 
The average period of hold-up in the new plant was somewhat less than that 
used in the other plants but it could be easily increased. 

(5) Recycle: The only major difference between production facilities at the 
other plants and the present one lay in the introduction of a recycle stage which 
slightly increased conversion of the reagents prior to precipitation and filtration. 
Arguments were advanced which accounted for the longer filtration time in 
terms of this recycle stage. Arrangements could be made to temporarily elimi- 
nate the recycle stage. 

(6) Rate of Addition of Caustic Soda: Immediately prior to filtration a 
quantity of caustic soda liquor was added resulting in precipitation of the 
product. The addition rate was somewhat faster with the new plant but it was 
possible to procuce slower rates of addition. 

(7) Type of Filter Cloth ‘The filter cloths employed in this plant were very 
similar to those used at the other sites. However, they did come from more 
recently supplied batch and it was suggested that their performance should be 
compared with cloths from previously supplied batches which were still available. 

In the following design the minus version corresponds to the usual operation 
for the new plant and the plus version to the change. Thus we have 

- + 
(1) water town well 
(2) raw material on site other 
(3) temperature of filtration low high 
(4) hold up time low high 
(5) recycle included omitted 
(6) rate of addition NaOH fast slow 
(7) filter cloth new old 


The 24;; design with generators 


I=125, I=136, I=237, 1I=1234 


was chosen. This design is equivalent to the 2,} design considered earlier, 
but is obtained by a different association of variables, that is, 5 = 12,6 = 13, 

= 23 and 4 = 123. Eight experiments run in random order gave the filtration 
times listed below. 


-_ 
oo 
a 
o 


Filtration Time 


68.4 
eee 
66.4 
81.0 
78.6 
41.2 
68.7 
38.7 


ewhd 

| 
i+tit+t il 
I++l ict 
I+t+i1+ ? 
~I 


wOIde. 
+t+iti+ic+i 
* 

+ | 

+i1+ 1 





336 G. E. P. BOX AND J. S. HUNTER 


The usual analysis gives the estimates 


water f,=14+254+36+4+47 = -10.9 
raw material f,=2+154+374+46= — 2.8 
temperature 4,=3+164+27+4+45 = —16.6 
hold up 4&=4+354+264+17= 0.5 
recycle 4,=54+12+34+67= 3.2 
rate of addition NAOH f = 64+134+24+4+57 = —22.8 
filter cloth f,=74+234+14456 = — 3.4 


The estimates — 10.9, — 16.6, and —22.8 are suspiciously large when compared 
to the others. The simplest interpretation of the results would be that the main 
effect of the factors 1, 3 and 6 were important. However, many other interpreta- 
tions are possible. Among these would be that the main effects of factor 3 and 6 
and the interaction 3 6 (which is associated with 1) were responsible for the 
observed results. Equivalently the main effects of 1 and 6 with 1 6, or 1 and 
3 with 1 3, could be responsible. It was decided therefore to repeat the design 
with reverse signs, yielding the following results: 


PoE > 2: Oe A Filtration 


- 66.7 
65.0 
86.4 
61.9 
47.8 
59.0 
42.6 
- 67.6. 


a a ene Bs ae | 


L pales ae 


+ 
+ 
+ 
+ 


The estimates from this second design alone are 


f= -1425436+447 
= -24154374+46 
—-34+164+27+445 
—44+354+26+417 
-54+124+34+467 
—64+13+424+457 
-74+234+144+56 


15. 


a 


Whence by taking sums and differences of the linear combinations provided 
by the two component designs we obtain for the aggregate design 


25+36+47 
15+37+46 
164+27+45 
35+26+17 
12+34+67 
134+24+57 
23+14+56 


— ht bee ee OCUmKSlC<it SC KS!UCULD 


“nuuu ud 


on review it seemed likely that the effect —19.2 associated with factor 6, and 
the effect — 16.2 associated with the linear combination (1 6 + 2 7 + 45) were 
probably real. It was also to be noted that the largest of the remaining effects, 
—6.7, was associated with factor 1. The most likely explanation of the data 





THE 2*—P? FRACTIONAL FACTORIAL DESIGNS 


Slow 65.4 42.6 
Rate of 
Addition of NaOH 


Fast €3.5 78.0 


Reservoir Well 
Water Supply 


Figure 1 
Two way table of average responses 


therefore was that variables 1 and 6 both have effects and that they interacted. 
A two-way table of average values exemplifying these effects is shown below 
in Figure 1. It should be noted that the other explanations of the data are quite 
possible. For example the large effect attributed to the interaction between the 
factors 1 and 6 could be attributed equally well to the interaction 2 7 or 4 5. 
The fact that none of the factors 2, 4, 5 and 7 have main effects does not, of 
course, preclude the possibility that their interactions exist. In fact in terms of 
the response surface if the center conditions of the experiment are located on 
the crest of a diagonally running ridge we should expect exactly this situation 
to occur. Of the possible explanations, however, that involving 1 and 6 and 
their two-factor interaction seemed by far the most likely. The crucial test 
was whether the trouble would be cured by using well water and the slowaddition 
rate of caustic soda while leaving the other variables at their usual levels. 

A number of additional trials were run on the plant in which the only modifi- 
cations made were the use of well water with a slow rate of addition of caustic. 
These runs did give satisfactorily short filtration times in the neighborhood of 
forty minutes and the modification was adopied. 


5: Reso.ution IV Designs 


We have seen that a valuable design can be generated by switching the 
signs of all the variables in a 277} fractional factorial and adding the resultant 
design to the original fraction. This aggregate design, which uses sixteen runs, 
makes it possible to estimate all seven main effects clear of the two-factor inter- 
actions. The design is thus of resolution IV. In fact, it is a 2j,* design. It is 
possible to do even slightly better than this. The signs of the elements correspond- 
ing to the identity column I can also be switched, and the resulting set of eight 
positive and eight negative signs can be associated with an eighth factor. The 
final design is shown in Table 15 on page 338. 

We call such a design a “fold over’’ design. 

We must now consider what the generators and hence the defining relations 
are for this design. Each component group of eight runs can be regarded as a 
2°-° design with generating relations 


I= 8= 124= 135= 236 1237 
and 


= —-8 = —-124= -135 = —-236=1237 





G. E. P. BOX AND J. S. HUNTER 


TABLE 15 
A 2854 fold over design 





5 6 7 


| N 
oh 


Eaten 
| 
: 
Lol eI 


Principal fraction 
gi-4 
III 


+++++4+4++ 4 @ 


Pear rae ea eee seed 
ees 





Sea 4 


I++] +41 


Principal fraction with 
all signs reversed 


b++++] +++ 


Litt 
| 

b+i +++ 

bi +t+++ 

b++i + 


respectively. Applying the rule for combining fractions we notice at once that 
1237 isa generator for the aggregate design, and the remaining three generators 
are independent even products of 8, 1 2 4, 1 3 5, and 2 3 6. In particular, we 
can use 1 2 48, 13 5 8 and 2 3 6 8 so that finally the generating relations for 
the aggregate design is: 


I=1248=1358 =2368=1237 (22) 
The defining relation is therefore 
I=1248=1358 =2368 =1237 =2345=1346=3478 
1256=2578=1678 = 4568 = 2467 =1457 =3567 
12345678 
“4 


The generating relations for all sixteen of the 2%;* factionals are: 
I= +1248, I= +1358, I= +2368 and I= +1237. 


Ignoring interactions between three or more factors, and using the principal 
defining relation, the sixteen quantities which can now be estimated from the 
principal one-sixteenth fraction are given in Table 16. 

As before further fractions can be performed in combination with the original 
fraction to isolate particular two-factor interactions or combinations of two- 
factor interactions. It will be seen now that when a design is formed containing 
2"*' runs from a design containing 2‘ runs by replicating the 2° design with 
reversed signs and associating some further factor X with the 2" plus ones and 
2* minus ones, then a general rule for obtaining the generators and defining 
relation of the new design from the generators and defining relation of the 
old design is as follows: 1) All generators which contain an even number of 





THE 2*—P FRACTIONAL FACTORIAL DESIGNS 


TABLE 16 
Effects estimable using the 28y* design 


Average 
1 


8 main effects 


12+37+48+56 
13+27+58+46 
7 sets of two-factor 14+28+36+57 
interactions confounded 15+38+26+47 
in groups of four 16+78+34+25 
17+23+68+45 
18+244+35+67 


characters in the original design are retained as generators in the new design, 
2) All generators which contain an odd. number of characters in the original 
designs will be reproduced containing the extra character X as generators in 
the new design. For example, the generator 1 3 4 will become 1 3 4 X. 


An Alternative Method for Generating Designs of Resolution IV 


An inspection of the generators for the 2y* design just described wiil show 
that an alternative method for constructing this design would be to write down 
in standard order the sixteen combinations of variables for a complete 2* factorial, 
and then to associate further factors with the four three-factor interactions. 
To demonstrate, let the 2* factorial be written down in terms of the variables 
1, 2, 3 and 8. The four three-factor interactions are then 1 2 8, 1 3 8, 23 8 and 
1 2 3. These can now be associated with the four new variables 4, 5, 6 and 7 to 
give the set of four generators 


8 4 

3 8 5 
23 8 6 
12 3 7 


The design thus constructed is identical to that given in Table 15. The only 
reason, of course, for starting off with variables 1, 2, 3 and 8 instead of 1, 2, 3 
and 4 is to show the identity between this method of construction and the 
previous one. 

As a further example of this second method for constructing resolution IV 
designs let us construct the 2}$~"' design. Since the design contains 32 runs we 
begin by writing down the full 2° factorial in the variables 1, 2, 3, 4 and 5. 


Eleven additional variables are now introduced by associating them with the 





340 G. E. P. BOX AND J. S. HUNTER 
ten three-factor interactions and the single five-factor interaction. We thus 
have for the set of eleven generators 
Ess 
12 
2 


3 
. 2... o.@ 


5 
5 
5 
5 


If three-factor and higher order interaction terms are negligible, thirty-two 
independent estimates can be obtained. They include the grand average, the 
sixteen main effects 1, 2, 3, --- , 16; and the fifteen combinations of two-factor 
interactions displayed below 

12 +1516+ 36 + 47 + 58 + 9124+ 1013+ 1114 

13 + 26 +1416+ 49 + 510+1115+ 712+ 813 

144+ 27 + 39 +1316+ 511+ 612+1015+ 814 

15 + 28 + 310+ 4114+1216+ 613+ 714+ 915 
16 + 23 +1415+ 412+ 513+1116+ 79 + 810 
17 + 24 + 3124+1315+ 514+ 69 +1016+ 811 
18 + 25 + 313+ 4144+ 1215+ 610+ 711+ 916 
19 + 212+ 34 +1011+ 515+ 67 +1314+ 816 
110+ 213+ 35 + 4154+ 1214+ 68 + 716+ 911 
111+ 214+ 315+ 45 + 910+ 616+ 78 +1213 
112+ 29 + 37 + 46 + 516+ 1014+ 815+ 1113 
113+ 210+ 38 + 416+ $6 +1112+ 715+ 914 
114+ 211+ 316+ 48 + 57 + 615+ 913+ 1012 
115+ 216+ 311+ 410+ 59 + 614+ 713+ 812 
116+ 215+ 314+ 413+ 512+ 611+ 710+ 89 











THE 2*—? FRACTIONAL FACTORIAL DESIGNS 341 


In general, a resolution IV design may always be constructed by first writing 
down the design matrix for a two-level factorial and then associating new vari- 
ables with all those interaction columns having an odd number of numerals. 

Of course, this 2};"? design could have been obtained by fold-over by first 
writing down the 2;{;"' design, the saturated resolution III design for fifteen 
variables in sixteen runs. The eleven generators for this design is given in Table 
17a. In Table 17a the variables are numbered from 2 to 16 to make the equiva- 
lence between the two methods of construction evident. The generators for 
the 2i°"" obtained by fold-over is shown in Table 17b. These generators are 
obtained by attaching the variable 1 to every word in the generating relation 
of the 2}};"" having an odd number of symbols. The generators, and hence the 
design obtained by fold-over, are thus identical to those displayed earlier in 
Eq(25). The same principle of fold-over may be used with the Plackett and 
Burman designs. For example, using the Plackett and Burman design for k = 11 

























TABLE 17a 





TABLE 17b 


Generating Relation 2}7;"' Generation Relation for 2}%>"' 


Obtained by Fold-over 
























6 2 3 6 
2 4 7 2 4 7 
2 5 8 2 5 8 
9 9 


variables in twelve runs we may derive a design usable for studying twelve 
variables in twenty-four runs in which no two-factor interaction is aliased with 
any main effect. 


Complete Factorials within Fractionals Applied to Screening 





When little is known about the variables which effect a particular response 
we are in what may be called a screening situation. That is to say, that although 
it is necessary to test a rather large number of variables which might conceivably 
have important effects, it can be realistically postulated that only a few, perhaps 
one, two or three of the variables, will be of major importance. Whichever 
variables do turn out to be of major influence may of course interact with one 
another. To put this argument in another way, we may have a fairly large 
number, say eight variables, which are of possible importance, but we believe 


G. E. P. BOX AND J. S. HUNTER 


Fiaure 2—Projection of 245} into three 2? factorials. 


the effects of all but, say, three of these are likely to be negligible. Thus, we 
tentatively entertain the idea that at least five of the variables can be regarded 
essentially as dummies, but we don’t know which five. In these circumstances we 
need a design in the complete set of eight variables which will produce a com- 
plete factorial in any three of the component variables. Thus, although we 
don’t know which subset of the variables will turn out to be important, which- 
ever subset does, provides a full factorial, or even a replicated factorial, in 
those variables. 

The basic idea is illustrated in the very simpliest case for the one-half repli- 
cate of the 2° factorial shown in Figure 2. Suppose the total number of variables 
considered is three, but it can be reasonably postulated that not more than two 
have any real effects. Then we see from Figure 2 that the design supplies a 
complete factorial in any of the three pairs of variables since each projection 
of the 2*;} design into a two dimensional plane produces a complete factorial 
design. This is also apparent from inspection of the design matrix since, if we 
drop any one column of the design matrix, the remaining two columns provide 
a full 2” factorial. This can be seen even more simply, for the generating relation 
for this design is I = 1 23 and if any one of the variables is dropped the generator 
will vanish showing that the resulting design is not a fractional factorial. 

In general, it is clear that a design of resolution R will provide a complete 
factorial in any sub-set of the (R — 1) variables. This must be so since every 
word in the defining relation contains R or more characters. It follows that is 
all but (R — 1) characters are treated as dummies, then every word in the 
defining relation will disappear. 


2° Factorials within Resolution IV Designs 


If a design of resolution IV contains r X 2° runs then it can be regarded as 
providing r replicates of a full factorial in any three variables. As an example, con- 
sider the sixteen-run resolution IV design for eight variables i.e. the 2,* design. 











THE 2k—P FRACTIONAL FACTORIAL DESIGNS 343 


This design can be regarded as providing a twice replicated 2° factorial for 
every one of the fifty-six choices of three variables out of eight. Geometrically, 
this means that the sixteen points in eight dimensional space can be projected 
into any one of the fifty-six three dimensional coordinate sub-spaces to produce 
a replicated cube. The reader can readily confirm for himself that the omission 
of any five columns from Table 15 provides a twice replicated factorial in the 
remaining variables. 

As always, evidence from experiments of this kind should only be regarded as 
suggestive and subject to confirmation rather than as supplying definite proof. 
Alternative explanations of the results obtained from such experiments involving 
higher order interactions could be easily produced. However, in selecting alter- 
native explanations as worthy of further study we rest heavily upon our prior 
beliefs about the plausibility of these alternatives. 

It is interesting to note an early use of designs of this kind by Tippett [15]. 
An adequate statement of the proper attitude towards the results is to be found 
in a discussion by R. A. Fisher [12] of Tippett’s example. 































General Rules for Designs Obtained by Projection 





We have seen that a design of resolution R provides a complete factorial in 
any sub-set of (R — 1) variables. In particular, designs of resolution III may 
be used for screening up to two variables out of N — 1 variables, designs of 
resolution IV may be used for screening up to three variables out of N/2 vari- 
ables, and designs of resolution V, which we shall discuss later, may be used 
for screening up to four variables out of a larger number. If a design of resolu- 
tion R is used to screen subsets of R variables, then full factorials will result 
for certain subsets, and fractional factorial for others. Those subsets of 
variables providing fractional factorials are simply subsets which appear as 
words in the final defining relation. For example, consider the 2t;* design 
discussed earlier. Its defining relation is 





I 


1248=1358=2368 
1256=2578=1678 
12345678. 


I 
I 







1237=2345=1346=3478 
4568 =2467=1457=3567 














Regarded as a design to screen sub-sets of four variables, this design will provide 
replicated half-fractions for the fourteen combinations of variables 1, 2, 4 and 8; 
1, 3, 5 and 8; 2, 3, 6 and 8; etc., which appear as forming words in the defining 
relation, and complete 2* factorials designs for any one of the remaining fifty-six 
combinations of four variables. In the case of resolution V designs we can, in 
accordance to our general rule, obtain full factorials in any set of four variables. 
These designs would, for most purposes, also be adequate for screening five 
variables because even for those combinations of variables which appear as 
words in the defining relation, one-half replicates would be available, and these 
would permit all main effects and two factor interactions to be distinguished, 
on the assumption of course that higher order interaction effects are negligible. 














344 G. E. P. BOX AND J. S. HUNTER 
Example 


The problem of analyzing these designs can be thought of either as picking 
out the one, two or three variables whose main effects and interactions can 
account for all the effects found, or equivalently for looking for sets of repli- 
cates within the runs. As an example, consider the data given in Table 18 ob- 


TABLE 18 


Variables 
8 4 


ir 4 
Fi bit+- tei ei 
oe terete 


Canour WON = 
l+++4+1 


bitte 
++tt+ett+ ii 
I+1++1 
ear ae 
b+ittiti 


++ 
USRISALSSBAVSSSE 


WWD OAWNNnN HB ORROWH OM 


. 
: 
2 
. 
- 
; 


++4+4+1 
+1 +1 
+1 

| 


ot 
ot 


al 


tained from a screening experiment containing eight variables, using the gener- 
ators 1248,1358,2368and1237 
The estimated effects are given in Table 19a. 


TABLE 19a TABLE 19b 


Average 


w 


Responses 


60.4 62.1 
75.4 73.0 
61.2 59.6 
67.3 66.7 
66.0 63.3 
82.9 82.4 
68.1 71.3 

wevk 


OTAUNPIWd = 
~I1 0° 
NWA aot BND QO 
1+ 1+) 


+14 


12+37+48+56 
13+27+58+46 
144+28+36+57 
15+38+26+47 
16+78+34+4+25 
17+23+68+45 
18+24+35+67 


ign uti selena cic aa acolo 


moor orr NFS 






















































THE 2*—? FRACTIONAL FACTORIAL DESIGNS 345 


It was not expected that more than a few of the eight variables would in 
fact have important effects upon the response, and it will be seen that the data 
is readily explained by supposing that the important variables are 3, 5 and 8. 
The main effects and two-factor interactions associated with these variables 
are underlined in Table 19a. On this explanation runs 1 and 3, 2 and 4, 5 and 7, 
6 and 8, 9 and 11, and 10 and 12, 13 and 15 and finally 14 and 16 are essentially 
duplicates one of the other differing mainly because of experimental error and 
partly because of effects of the other variables of lesser importance. The 
data are rearranged as a duplicated 2° factorial in variables 3, 8 and 5 
in Table 19b. 

In an experiment of this kind it would have been advantageous to have 
available some independent estimate of pure error obtained, for example, from 
duplication of certain of the runs selected in accordance with principles de- 
scribed elsewhere [10]. In such a case we could then compare the size of the 


error obtained from the “constructed’’ duplicates with that from known 
duplicates. 





Blocking for Designs of Resolution IV 





Assuming that interactions between three or more variables are negligible, 
the 2%;* design with generators 1 23 7,1 248, 135 8 and 2 3 6 8 provides in- 
dependent estimates of the eight main effects and of seven groups of two-factor 
interactions. By using the + and — signs associated with the interaction columns, 
this design can be broken into either two, four or eight equal sized blocks which 
are unconfounded with main effects. 

For example, we may use the + and — signs associated with the two-factor 
interaction set 12 + 37 + 48 + 5 6, to define two blocks, if we call the block 
contrast B, , and put B, = 12 + 37+ 48+ 56. To break the design into 
four equal blocks, two columns associated with the interaction sets may be 
used. For example, we might choose B, = 12 +37+48-+ 5 6 and B, = 
13+27+58-+ 46. Each of the four blocks will contain the four runs identi- 
fied by the pairs of versions (+, +), that is, the four sets of versions (—, —), 
(+, —), (—, +), (+, +) provided by the two interaction columns. The block 
interaction effect B,B, will then be found to be associated with another two- 
factor interaction set, that is, B,B. = 17 +23 + 68+ 45. This can be con- 
firmed by actually multiplying out the elements of the columns associated with 
B, and B, , or simply by noting, for example, that products of the interaction 
elements in B, and B, are12 X13 =23,12X27=17;56X48=45 
etc. 

To break the design into eight equal blocks, three of the interaction sets 
must be used. However, in choosing the third set we may not use the column 
associated with the interaction B,B, , although any of the remaining interaction 
columns may be used. Let us choose B, = 18 + 24+35-+ 67. Each of the 
blocks will now contain the two runs identified by the eight sets of versions 
(+, +, +) provided by B, , B, and B; . It can be readily confirmed by multi- 
plying out the elements of the block columns that the complete set of seven 
two-factor interaction comparisons are now used up, that is, 


G. E. P. BOX AND J. S. HUNTER 


+ Block6 


—- — +. Block 5 
} - + +  Block7 


ao 
-f}- 


-t 


g 
& 
> 
Re 
3 
2 
RS 
aT. 
@ &' 
a> 
: 
5 
3 


+ + Tht + + Blocks 


4+ 


Bi Bz B; 
12 13 18 


6 


5 
123 128 138 238 


4 


TABLE 20a 
Construction of 245‘ in Blocks of Two 


7 


++ 


++ 








[+ it it tt it it tt UF 
























THE 2*—P FRACTIONAL FACTORIAL DESIGNS 






B, = 12+37+48+56 
B,=13+27+58+46 
B= 18+24+35+67 
B,B,=17+23+68+45 (26) 
B.B, = 14+28+36+57 

B.B, = 15+38+26+47 
B,B.B, = 16+78+34+25 


As the reader can confirm for himself, subject only to the condition that the 
B,; must not be chosen so as to coincide with B,B, , the association between 
blocks and two-factor interactions can be made in any other way whatever. 

Tables 20a and 20b show how we can write out a 23;* design arranged 
in eight blocks of two runs each. Since the complete design contains sixteen runs, 
we begin by writing down a 2* factorial in four of the eight variables. (In order 
that the final design may be compared with designs obtained previously, we 
chose these variables to be 1, 2, 3 and 8 although they could just as easily been 
chosen to be 1, 2, 3 and 4.) The variables 4, 5, 6 and 7 are then associated with 
the three-factor interactions. The generators are 1 2 3 7,1 248,135 8 and 
23 68. As illustrated in Table 20a we now write down the three columns corre- 
sponding to the interactions 1 2, 1 3, and 1 8 and associate these with the block 
factors B, , B, and B; . The eight blocks are then obtained by putting those 
pairs of runs for which B, , B, and B; are (— — —) into the first block, the pair 
of runs for which B, , B, and B,; are (+ — —) in the second block and so on. 

In Table 20b it will be noted that the second run in each block is the fold- 
over, or mirror image, of the first run, that is, the versions of one run are exactly 
reversed in the second. Suppose now that a single run is taken from each block 
such that one of the variables always appears with the same sign. Choosing, 
for example, those runs with the + version of variable 1 we obtain the array 
given in Table 21. 

The reader will note that the result, omitting variable 1, is the 2};{ design in 









Il 


the variables 2, 3, 4, --- , 8 with generating relationI = 237 = 284=385 = 
2386. This design is identical, except for the number identification given the 
seven variables, to the 27;{ design described earlier. 


TABLE 21 












a 
N 
w 


8 7 


> 
uw 
a 








+++++4+4++ 
+itititi 
Litt 
+++4+1 
L++ 
titi iti¢+ 
it t++ 
l+1++1 





| 















































































































348 G. E. P. BOX AND J. S. HUNTER 


We see now that the principle of fold-over can be modified slightly to provide 
resolution IV designs automatically broken into blocks of two runs such that 
the block effects are unconfounded with the main effects. We begin by writing 
down the design matrix for the appropriate resolution III design in k variables 
plus an additional column I consisting solely of plus signs. Each row of the 
design matrix is then folded-over, that is, repeated with all signs reversed. 
The pairs of rows form blocks of two of a resolution IV design in k + 1 variables. 
For example, the 2f;’ in blocks of two is constructed by first writing down 
the 2°;; along with the column I, and then folding over each row to provide 
four blocks of two runs each as illustrated in Table 22. This 21," design is identical 


TABLE 22 


Original 23-1 247' in Blocks of two, I = 1234 
1 ees 


a Block 1 


ae Fs 
+ 1 


ve Block 2 


iP. 2 


*} Block 3 


a. 
+ + 


*} Block 4 


to that obtained earlier, and illustrated in Table 12, in the discussion of blocking 
designs of resolution III. Similarly, the 2ty* design in blocks of two obtained 
in Table 20b could also have been formed by using the principle of fold-over 
starting out with the 27; along with a column vector of plus signs and pairing 
each run with its fold-over. The same is true of the 2}}"'' obtained by fold-over 
described in Table 17b. The designs obtained by folding over the Plackett 
and Burman designs can also be broken into blocks of two runs each using 
precisely the same device. 

In general, any resolution IV design provides an opportunity to obtain blocks 
of size two whose effects do not confound any of the main effects. In doing this, 
of course, we confound the two-factor interactions with blocks. Nevertheless, 
the resulting designs are of considerable interest. Often, the comparisons be- 
tween blocks merely represent influences upon the response having a somewhat 
higher variation than that responsible for differences within blocks. In these 
circumstances it is reasonable to think of the strings of two-factor interactions 
simply as being estimated with a variance somewhat higher than that appropriate 
for the main effects. For instance, these designs may be used where it is suspected 
that a time trend may occur during the course of the trials. Provided proper 
randomization is applied both to the order of runs within the blocks and to the 
order of running the blocks themselves, the design is such that whereas 
main effects were determined with a variance appropriate to successive observa- 












THE 2*—? FRACTIONAL FACTORIAL DESIGNS 349 


tions, the strings of two-factor interactions would be estimated with a variance 
appropriate to pairs of runs made in random order in the presence of a time trend. 


“ Major and “‘ Minor” Variables in Resolution IV Designs 


We have already seen that a 2” * design of resolution IV can be regarded from 
two points of view: (1) it is a design suitable for providing estimates of the p 
main effects even though two-factor interactions may occur, and (2) it is a 
design suitable for providing unbiased estimates of all main effects and inter- 
actions between any three of the factors if the others are of no importance. 
The designs can be considered from still another point of view. Considering the 
2'*-" design as an example, we have seen how this design may be run in sixteen 
blocks of two runs each, the blocks being obtained from four block generators 
associated with two-factor interactions. Alternatively we can choose the four 
block generators to represent actual variables. Suppose for example, we have 
four “major” variables for which we wish to estimate all the main effects “irid 
all the interactions and we have sixteen further variables which we believe 
exert at most main effects, and may be conveniently viewed as “minor’’ variables. 
Then this design may be employed associating the four major variables with 
the block generators and the remaining minor variables with the sixteen “main 
effect’’ factors. Of course, all the effects among the major variables will now be 
confounded with the sets of two-factor interactions of the minor variables. 
However, since minor variables are believed to exert at most main effects, the 
two-factor interactions between these variables are tentatively assumed to be nil. 

In this connection, there is an opportunity to make use of any prior feeling 
which the experimenter may have concerning the possibility of interaction 
in the minor variables. Should he feel particularly anxious about a possible 
interaction between two minor variables, then he can usually arrange, by inspect- 
ing tables such as that given in Equation (26), that this interaction is associated 
with an unimportant interaction between the major variables. For instance, 
in this present example, the interactions 1 6, 7 8, 3 4 and 2 5 between the minor 
variables are all confounded with the three-factor interaction B,B,B,; between 
major variables. This three factor interaction might be expected to be unim- 
portant a priort. It should be noticed that so long as B, , B, and B; are pseudo 
variables representing comparisons among blocks, then interactions such as 
B,B.B, will represent comparisons of precisely the same potential as are repre- 
sented by the main effects B, , B, , etc. When, however, B, , B, , etc., are used 
to represent real variables, main effects and interactions revert to their former 
relative status. 

When thirty-two runs are to be made and where there are four major vari- 
ables along with sixteen minor variables, the 2}""’ design may be employed. 
With sixteen runs three variables and all their interactions may be investigated 
by associating the block generators with these major variables and the eight 
minor variables then introduced. For eight runs, two major variables and their 
interaction plus four main effect variables are possible. Of course, when the 
designs are used in this way no blocking is permissible. However, even here a 
certain degree of flexibility is possible. For instance, for the thirty-two run 
design we might wish to have only two principal factors in which case we could 










































































































































350 G. E. P. BOX AND J. S. HUNTER 


associate these with two of the block generators using the other two block 
generators to form blocks of eight. It will be clear to the reader that these arrange- 
ments provide a very versatile set of designs which may be used in a variety 
of circumstances. 


From Resolution IV Designs to Resolution III Designs 


When the 2{}""' designs is used to study simultaneously four major variables 
along with sixteen minor variables, a convenient notation for the design is 


‘cn 


where the symbol C is read “contained in” or “embedded in”. Thus 2° C 2°;* 
and 2? C 2f;’ identify the sixteen and eight run designs described above. 

The construction of a resolution III design from one of resolution IV now 
becomes obvious. The 2* C 2;5~"' is clearly a design for studying twenty vari- 
ables in thirty-two runs. Suppose now that one of the interactions between the 
major variables is used to bring in still another variable. In fact, we might be 
willing to assume that all the interactions between the four major variables 
are negligible and in this instance eleven new variables (one for each of the 
interactions) could be introduced. The result, of course, is the 2}1;77° design, 
that is, the fully saturated resolution III design for studying thirty-one variables 
in thirty-two runs. 


Other Embedded Designs 
The principle of embedded fold-over pairs producing blocks of two runs 


has wide application. As an example less orthodox than those mentioned above, 
we note that the thirty-two run 2;,""’ design in sixteen blocks is one in which 
a central composite design in three variables can be embedded. The composite 
design [14] would employ factor combinations of three major variables consisting 
of a 2° factorial along with six axial points and two center points for a total of 
sixteen points. Each of these factor combinations would then be duplicated, 
one duplicate containing one combination of versions of sixteen minor variables 
with the second duplicate the mirror image of these same sixteen variables. 
The additional sixteen variables would have to be such that they were not 
expected to have any effect other than a linear one. 

Part II of this paper will contain a discussion of Resolution V designs along 
with an appendix. 


BIBLIOGRAPHY 

(1) Yates, F., The Design and Analysis of Factorial Experiments., Imper. Bur. Soil Sct. 
Tech. Comm. 35, 1937. 

(2) Fisher, R. A., The Theory of Confounding in Factorial Experiments, in Relation to the 
Theory of Groups, Ann. Eugen. 11, 1942. 

(3) Daniel, C., Fractional Replication in Industrial Research, Proc. 8rd Berkeley Symp. 
V., 1956. 

(4) Finney, D. J., The Fractional Replication of Factorial Arrangements, Ann. Eugen. 
12, 1945. 

(5) Brownlee, K. A., B. K. Kelly and P. K. Loraine, Fractional Replication Arrangements 
for Factorial Experiments with Factors at Two Levels, Biometrika, 35, 1948. 





THE 2k—P FRACTIONAL FACTORIAL DESIGNS 351 


(6) Plackett, R. L., and J. P. Burman, The Design of Optimum Multifactorial Experiments, 
Biometrika, 33, 1946. 

(7) Clatworthy, W. H., W. S. Connor, Deming and M. Zelen, Fractional Factorial Experi- 
ment Designs for Factors at Two Levels, U. 8. Dept. Comm. Nat. Bur. of Stnds., Applied 
Math. Ser. 48, 1957. 

(8) Kempthorne, O., A Simple Approach to Confounding and Fractional Replication in 
Factorial Experiments, Biometrika, 34, 1947. 

(9) Bose, R. C. and Kishen, K., On the Problem of Confounding in the General Symmetrical 
Factorial Design, Sankhya 5, 1940. 

(10) Dykstra, O., Partial Duplication of Factorial Experiments, Technometrics, 1, 1959. 

(11) Plackett, R. L., Some Generalizations in the Multifactorial Design, Biometrika, 33, 1946. 

(12) Fisher, R. A., The Design of Experiments, Oliver & Boyd, London, 1935. . 

(13) Box, G. E. P., Multifactor Designs of First Order, Biometrika, 39, 1951. 

(14) Box, G. E. P. & K. B. Wilson, On the Experimental Attainment of Optimum Condition, 
J. Roy. Stat. Soc., B., 13, 1951. 

(15) Tippett, L. H. C., “Applications of Statistical Methods to the Control of Quality in 
Industrial Production”, Manchester Statistical Society, 1934. 

(16) Yates, F., Complex Experiments, J. Roy. Stat. Soc., Suppl., 2, 1935. 

(17) Rao, C. R., Factorial Experiments Derivable from Combinatorial Arrangements of 
Arrays, J. Roy. Stat. Soc., Suppl., 9, 1947. 





P 
t 
r 
r 
I 
1 
1 
I 
€ 


















Vor. 3, No. 3 TECHNOMETRICS Aucust, 1961 


Partial Confounding in Fractional Replication 


W. J. YoupEN 
National Bureau of Standards 






In a completely saturated fractional factorial the two factor interactions are 
confounded in particular subsets with the main effects. It is possible to choose two 
different completely saturated factorials with different sets of interactions con- 
founded with any given main effect. Examination of the numerical results obtained 
with each of the saturated designs will usually make it possible to identify two 
factor interactions when present. 





























yi HIstTorRicaAL INTRODUCTION 


The experimental designs used in field trials often involved several complete 
replications. Frequently these agricultural experiments were of a factorial type. 
Many studies were made of the effects of the nutrient elements, nitrogen, 
phosphorus, and potassium, each used at two rates of application. This 2° 
factorial involves eight combinations of factors customarily represented by 
the notation npk, np, nk, pk, n, p, k and (1). The main effect for nitrogen, N, 
is based on the contrast (npk + np + nk + n) — (pk + p+ k + (1)). A two- 
factor interaction, such as NP, is given by the contrast (npk + np + k + (1)) — 
(nk + pk + n + p). The three factor interaction is given by the con- 
trast (npk + n+ p +k) — (np + nk + pk + (1)). 

In a field trial all eight combinations might constitute a randomized block, 
there being as many blocks as replications. The block size could be reduced to 
four combinations by confounding the three factor interaction with the difference 
between the two blocks comprising the replication. A particularly attractive 
possibility arose if there were several replications. A different interaction could 
be confounded with each pair of blocks that made up a replication. Should four 
replications be available, the four interactions could be confounded, one in each 
replication. This meant that, while all four replicates were used to evaluate the 
main effects, each interaction was estimated using only the three replicates in 
which it was not confounded. Thus three fourths of the data was available for 
the estimation of the interactions. There was always the possibility that the 
reduction of the block size to four plots, instead of eight, would so reduce the 
error of the within block comparisons that the three replicates would actually 
give better estimates than would four replicates using the larger blocks. 







FRACTIONAL REPLICATION 


The device of partial confounding of the type just described was associated 
with the availability of two or more complete replications (3). The advent of 
fractional replication (2) directed attention to another type of confounding. 
If just one of the two blocks confounding the three factor interaction is actually 


353 


W. J. YOUDEN 


TABLE 1 
Two different 1/16 fractions of a 27 factorial 


Set 1 Set 2 


Main effects confounded with Main effects confounded with 
positive interactions negative interactions 


abcdefg abde 
abd abfg 
ace acdg 
afg acef 
bef bedf 
beg beeg 
cdg defg 
def (1) 





used, then only half a replicate is available. In the case of the 2° factorial, the 
combinations used might be npk, n, p, and k. If an attempt is made to evaluate 
the main effect of a factor, it will be found to be completely confounded with 
the interaction of the other two factors. Fortunately, if there are enough factors, 
it soon becomes possible to select a half replicate that avoids confounding main 
effects with two factor interactions. A half replicate of a 2° factorial confounds 
the main effects with the four factor interactions, and the two factor interactions 
with three factor interactions. Confounding main effects with two factor inter- 
actions is undesirable because it may give misleading estimates of the main 
effect and, at the same time, cause interactions to be overlooked. 

The process of fractionation may be carried to such an extreme that all main 
effects are confounded with two factor interactions. (4). The eight combinations of 
a 1/16 fraction of a 2° factorial are shown as Set 1 in the first column of Table 
1. This extreme fractionation reduces the program to a class of designs known 
as weighing designs (5). Each one of the seven main effects is confounded with 
three of the 21 two factor interactions. If these two factor interactions can be 
assumed nonexistent or negligible, the main effect may be estimated by appro- 
priate contrasts of the type 


A = (abcdefg + abd + ace + afg — beg — bef — def — cdg) 


The left half of Table 2 shows the three two-factor interactions confounded 
with each main effect of Set 1. If the experimenter is willing to run an additional 
1/16 fraction it has been proposed (1) that a matching set made by reversing 
the factor levels has an advantage. Thus combination abcdefg would be replaced 
by (1) 

abd by cefg 


beg by acdf and so on. 


In that event the main effects, estimated on the combined 16 observations, 
are no longer confounded with the two-factor interactions. The sign of these 
interactions is reversed in the second fraction, and therefore the interactions 
drop out when the two fractions are combined. 


—a = * PP —_— Fs 8 89 TR 


a > 





PARTIAL CONFOUNDING IN FRACTIONAL REPLICATION 


TABLE 2 
Two factor interactions confounded with main effects 


Two factor interactions 


Qasoarnr 


Set 1 


BD, CE, FG 
AD, CF, EG 
AE, BF, DG 
AB, CG, EF 
AC, BG, DF 
AG, BC, DE 
AF, BE, CD 


Set 2 


BC, DF, EG 
AC, DG, EF 
AB, DE, FG 
AF, BG, CE 
AG, BF, CD 
AD, BE, CG 
AE, BD, CF 


The main effects should be calculated separately for each fraction. A sub- 
stantial contrast of the same sign in each fraction supports the conclusion that 
a main effect is present. If both the separate estimates for any given main effect 
are substantial but of opposite sign, this is equally good evidence that at least 
one of the three two-factor interactions confounded with this main effect is not 
negligible. Sometimes a guess might be made as to which of the three is re- 
sponsible. Thus, if main effects appear present for factors A and B, and the 
calculated main effect for D is large and plus in one fraction and large and 
negative in the other fraction, the natural thing would be to pick out the inter- 
action AB from the trio, EF, AB, and CG. Generally interactions involve at 
least one factor that has a main effect and here both are present. The estimate 
for the interaction would be obtained by taking the difference between the two 
estimates of the main effect for D. 


PARTIAL CONFOUNDING OF FRACTIONAL REPLICATIONS 


If a second 1/16 fraction is under consideration, there is an opportunity to do 
something more than reverse the levels of the factors as used in the first 1/16 
fraction. An entirely different 1/16 fraction is selected and the levels of this 
fraction reversed. A 1/16 fraction is needed that confounds the main effects 
with altogether different groupings of the two factor interactions. Such a solu- 
tion is given in the second column of Table 1. The interactions confounded 
with the main effect in Set 2 are shown in Table 2. Inspection of Set 2 shows 
that, for a given main effect, the trio of interactions have none in common 
with the trio listed for Set 1. The advantage of this device is that an interaction, 
if present, appears as two different main effects in the two sets and may be 
positively identified. This assumes a fairly simple situation with not too many 
main effects and interactions to muddy the pattern. 

If factors A and B have real main effects and these factors also interact, the 
contrast for D in Set 1 will reflect this interaction and the contrast for C in 
Set 2 should be nearly equal and opposite in sign. If the data show this state 
of affairs, identification of the interaction is on a firm basis. Every one of the 21 
interactions has a unique pairing of main effects associated with it. Experi- ° 





W. J. YOUDEN 


TABLE 3 
Examples using synthetic data constructed from known effects and normal deviates 


¢ 1.0 1.0 1.0 
Effects A =2.0 A =2.0 B =2.0 
built B=1.5 C = -—2.0 E = -—2.4 
in AB = -1.5 F=1.6 G = 2.6 

CF = -1.8 BG = -—2.2 
np = 5.0 zw = 5.0 nw = 6.0 


Example I Example III 
6.05 ; 4.25 
abd 5.14 
ace 6.95 
5.53 
bef 6.28 
beg 7.32 
cdg 2.93 
def 


Cente ee 
ESBSELE 


o> 
N 


Set 2 


(1) 

defg 
acdg 
abfg 
beeg 
bedf 
abde 
acef 


IP MODOBE 
SISLksers 
hROww~k|ANw 
—- We OD > 
=SRSSEES 
im 09 on GOO NY OO 
SaSAkSenye 


3. 
3. 
6. 
5. 
5. 
y 
6. 
6. 


Contrast Set 1 Set 2 Set 1 Set 2 Set 1 Set 2 


5.95 4.84 5.18 7.35 1.70 —1.24 
8.19 6.46 —10.52 3.37 4.44 5.06 
3.03 6.42 —11.14 —7.17 — .98 —1.36 
—10.77 1.76 —3.06 3.47 —3.10 10.54 
1.63 54 4.64 2.53 —21.88  —13.00 
—3.29 1.52 2.86 5.73 —1.82 4.06 
2.27 —3.30 3.04 6.03 9.36 12 .26 


menters may, under certain circumstances, find it advantageous to identify 
two factor interactions. 


EXAMPLES OF PARTIAL CONFOUNDING 


Table 3 shows three examples of constructed data. All the examples include 
random normal deviates, ¢ = 1.0, and rather modest main effects and inter- 
actions as listed at the top of the table. The ‘data’ are shown in the center 
section and the contrasts calculated from the data are given in the last seven 
rows of the table. 

Example I shows large positive effects for A and B in both sets suggesting 
main effects for A and B. The negative contrast for D in Set 1 and the positive 
contrast for C in Set 2 suggest a negative interaction. Examination of Table I 





PARTIAL CONFOUNDING IN FRACTIONAL REPLICATION 357 


shows that the interaction AB would appear in these contrasts. The order of 
the signs, negative in Set 1, positive in Set 2 indicates a negative sign for the 
interaction. 

In example II, factor C has the two largest contrasts, both negative, and this 
points to a negative main effect for C. The two plus contrasts for A similarly 
indicate a positive main effect for A. The large negative contrast for B in Set 1 
is coupled with a positive contrast in Set 2. This suggests looking at those inter- 
actions that involve B in the first set. These interactions are (see Table 2) 
EG, AD, and CF. Interaction EG would require a plus contrast for A in Set 2, 
AD would show a plus contrast for F in Set 2, and CF would appear as a plus 
contrast for G in Set 2. The choice is probably between F and G. One might 
easily conclude that there is a small main effect for G and a negative inter- 
action AD. These conclusions are incorrect but the two effects introduced for 
F and CF were quite small in comparison with o and consequently did not 
show clearly in the results. 

Example III assumed somewhat larger effects. The two large negative con- 
trasts for E indicate a negative main effect for E. A positive main effect for G 
is strongly suggested also. Set 2 has a large positive contrast for D. Interactions 
that involve D is Set 2 are CE, BG, and AF. Interaction CE would give an op- 
posite sign for A in Set 1. The sign for contrast A is not negative and the contrast 
is negligible anyway. Interaction BG would give an opposite (negative) sign 
for contrast EF in Set 1. The contrast is negative and much larger than the E 
contrast in Set 2 suggesting that E in Set 1 has been reinforced by the negative 
interaction BG. The remaining interaction, AF, would require a negative con- 


trast for G in Set 1. This would have reduced the contrast by cancelling out 
the main effect G. However the G contrast is large and positive. The evidence . 
therefore points to the negative interaction BG. The B contrast in both sets 
is large and positive and points to a possible main effect for B. This was the 
smallest of all the effects introduced. 


Some Typss oF PATTERNS SHOWN BY THE CONTRASTS 


The joint presence of one main effect and one interaction can give rise to 
three distinct types of patterns. The word pattern refers to the number and 
orientation of the observed contrasts. The combination of signs ++ indicates 
that items confounded on a contrast reinforce each other. The combination + — 
indicates the confounded items work against each other and may cancel out. 


Contrast Set 1 Set 2 Set 1 Set 2 Set 1 Set 2 
+ + ++ + +- + 





358 W. J. YOUDEN 
If there are two main effects and an interaction, the following patterns may 
result. 
Contrast Set 1 Set 2 Set 1 Set 1 Set 2 


A + + 
$e + 


a 
a 


~ ~ 
aa 
oe 


A, B, AB A, B, AC A, B, AD 


Discussion 


The partial confounding of main effects with fractional replicates does not 
give something for nothing nor does it solve all the problems of the experimenter. 
The main effects are no longer completely unconfounded with two-factor inter- 
actions. Instead each main effect is partially confounded with six two-factor 
interactions. Controlled partial confounding does show very clearly the price 
that is paid, a price that is inevitable when confounding is introduced. Consider 
an outcome that indicates that factor A has a substantial negative main effect 
and that factors A and G interact. The interaction AG and the main effect for 
A are confounded in the contrast labeled A in Set 1. The main effect A must 
now be estimated by the contrast A in Set 2 and the interaction AG by the 
contrast G in Set 2. In this instance, half the information is lost by this partial 
confounding but the interaction has been identified. The experimenter must 
choose what he wants. This, indeed, is the real art of experimental design. 


REFERENCES 


(1) G. E. P. Box ann K. B. Witson, (1951), On the Experimental Attainment of Optimum 
Conditions, Jour. Royal Statistical Soc. Ser. B., Vol. XIII. 

(2) D. J. Finney, (1945), The Fractional Replication of Factorial Arrangements, Ann. Eugen. 
XII, p. 291. 

(3) R. A. Fisner, (1942), The Theory of Confounding in Factorial Experiments in Relation 
to the Theory of Groups, Ann. Eugen. XI, p. 341. 

(4) R. L. Puacxett anp J. P. Burman, (1946), Design of Optimum Multifactorial Experi- 
ments, Biometrika Vol. 33, p. 305. 

(5) F. Yates, (1935), Complex Experiments, Jour. Royal Statistical Soc., Ser. B, Vol. II, 
p. 210. 


-~—~ 5» -» ~~ —-_—— —S— Flee 


Qo swe 





Vor. 3, No. 3 TECHNOMETRICS Aucust, 1961 


Finding New Fractions of Factorial 
Experimental Designs 


R. E. Fry 


Central Electricity Generating Board Research & Development Department 
London, England 


A method of obtaining symmetrical balanced fractions of 3" and 2” 3* factorial 
designs is proposed, based on an analysis of such designs into a complex of con- 
centric hyperspheres in an n-dimensional factor space. Two examples are constructed, 
a half-replicate of a 34 design and a half-replicate of a 23 3? design. Analysis shows 
both designs to have useful properties and to be relatively easy to analyse. Com- 
parison is made with a half-replicate of a 2* 3? design recently published by W. S. 
Connor. 


1. INTRODUCTION 


One of the main drawbacks of the simple full replicate multifactorial design 
is the way in which it rapidly grows to an unmanageable size as the number 
of factors increases. Before the development of fractional designs, this feature 
often precluded, on the grounds of expense and time, any designs with a large 
number of factors. And yet the ability to handle large numbers of factors simul- 
taneously, in an orderly predetermined manner, and with an unambiguous 
method of interpretation, is one of the principal advantages in all other respects, 
conferred by the factorial design structure. It is particularly important now- 
adays in the preliminary, factor-screening stages of modern industrial experi- 
mentation. 

The situation was considerably improved by the discovery [1] and development 
[2] of the fractional replicates of p” designs where p is a prime number or a 
power of a prime. These designs, and notably the fractions of 2” factorials, 
have had a remarkable impact and are undoubtedly among the most useful 
and widely applied of all designs today. 

Nevertheless the main problem is not yet fully resolved. Except in the rare 
cases where interactions can be ignored, only the fractions of 2” designs offer a 
reasonably extensive range of designs which are of moderate size. Many experi- 
menters still feel the need of designs with factors having more than two levels, 
if only to detect some degree of curvature in the main effects. The existing 
fractions of 3" designs are not sufficient, and in particular, development of 
fractions of 2”3" mixed level designs has been much needed. The work of W. 8. 
Connor of which some examples [3], [4] have just been published, and whose 
book [5] is now in the press, promises to fill the need. 

It was under the pressure of such needs that the random balance designs 
[6], [7] which have aroused so much controversy [8] were developed. In the 
absence of a wider range of fractional designs than has been hitherto available, 


359 





360 R. E. FRY 


methods such as these appear to have been very valuable in certain complex 
industrial experiments where orthodox designs did not apply. However, the 
price paid for their high degree of flexibility is the difficulty in deriving 
a standardised, simple, yet detailed method of analysis, such me*.ods having 
always been one of the principal virtues of orthodox designs. It is likely that 
even those who have found random balance useful would welcome any develop- 
ments combining flexibility with the simpler methods of analysis resulting from 
a more systematic design structure. 

Interest in the problem is growing considerably and we can expect much work 
to be done upon it in the near future. The present paper describes some limited in- 
vestigations in this direction aimed at deriving some new symmetrical balanced 
fractions of 3” and 2”3” designs. Two particular examples, a “‘half-replicate”’ 
of a 3* design (42 out of 81 trials) and a half-replicate of a 2°3 design (36 out 
of 72 trials) are given, and their properties analysed. The latter design is com- 
pared with the closely similar design published by Connor [3]. 


2. FRACTIONS OF 3” DESIGNS 


One way of widening the concept of a factorial design (especially when all 
factors are quantitative and continuously variable) is to abandon the restriction 
of factor values to a small fixed number of levels. This approach has been ap- 
plied in the response surface designs used sequentially as optimisation procedures, 
and originally developed by Box and Wilson [8]. In such designs, the design 
points are no longer restricted to the intersections of a hypercubic lattice in the 
factor space, but are arranged freely in any pattern which confers the desired 
properties. 

However the designs to be discussed here are true fractional replicates in 
that they consist simply of a selection from the totality of treatment combina- 
tions defined by single replicates of conventional 3" and 2”3” factorial designs. 
This type of fraction will probably always be necessary, firstly because not all 
factors will be quantitative and continuously variable, and secondly because 
there are often good practical reasons for wanting to restrict the factor levels 
to a small number of fixed values. 

There seems to be no very obvious way of deriving new fractions by a chain 
of mathematical reasoning. In any case, it is not particularly easy to decide on 
a detailed set of design properties which are both desirable and possible. An 
alternative approach is the empirical one of selecting fractions according to 
some principle which will give them the right sort of symmetry and factor 
balance, and then investigating their properties. The basic idea of the present 
paper originated from a study of 3” designs from this point of view. In a 3" 
design, suppose each treatment combination to be represented in the conven- 
tional fashion by a code containing n digits. The kth digit represents the level of 
the kth factor, and can take the values —1, 0, 1 representing the three levels. 
Then each such code may be thought of as the co-ordinates of a point in an 
n-dimensional factor space, and the whole design consists of 3” such points 
arranged in a hypercubic lattice. Now it can easily be demonstrated that this 
array of points can also be thought of as a centre point (0, 0, 0 - - - 0) surrounded 
by  sub-sets of points, each of which lies on the surface of one of n concentric 





~~ Ae 1A fF Fs HF 


yy —_— 


NEW FRACTIONS OF FACTORIAL EXPERIMENTAL DESIGNS 361 


hyperspheres centred on the point (0, 0, 0, --- 0). This is because any treatment 
combination which contains r non-zero digits (each of modulus unity) must 
lie on a hypersphere whose radius is r*. Since all the treatment combinations 
must fall into this category, and since r can only take the values from 1 to n, 
the point is proved. 

In fact it is equally simple to show that any p” design where p is any integer, 
may be broken down into a number of concentric hyperspheres. 

In the case of a 3” design, this sub-division of the design points can be repre- 
sented algebraically by the expansion 


satya 1teQt(art(Mey+--+2 a 


This expansion is closely related to a device used by Morrison [9] in his work 
on fractional replication of mixed-level designs. An important point demon- 
strated by this expansion is the fact that the number of design points on the 
kth hypersphere (k = 0 to n, where k = O represents the centre point) is 
n\ , n 


k (2") which can be formally considered as an array of (") two-level factorial 


designs each with k factors. For example, in the second hypersphere of a 3° 
design, there are three 2” designs as shown in Table 1. 

This geometrical way of considering the 3” design suggests two methods of 
selecting symmetrical balanced fractions whose properties it would be of interest 
to investigate. The first method is the omission of a sub-set of hyperspheres. 
The second method is the extraction of standard fractional replicates of the 2" 
designs which lie on each hypersphere. In the remainder of this section, the 
first of these methods will be demonstrated by application to a 3* design. In 
section 3, the second method will be shown to be applicable to mixed level 
23" designs. 

With the first method, one approach that immediately springs to mind is the 
omission of alternate hyperspheres. The designs so obtained have an interesting 
connection with earlier efforts by the writer to derive fractions of 3° designs 
by another method. It is well known that by the use of orthogonal polynomials 
of equally-spaced quantitative factor-levels, the total sum of squares in a 3" 
design can be divided into (3" — 1) components each with one degree of freedom 
which are mutually orthogonal under the usual model. These components 
correspond to the linear and quadratic effects of the factors, and the various 
interactions between these linear and quadratic effects. Every component, 
having only one degree of freedom, corresponds to a particular linear contrast 
of the observations, and thus represents a division of some or all of the observa- 


TABLE 1 
Second hypersphere of a 3° design 


=i) —1) 
-1) “ig 
1) 1) 
1) 1) 





R. E FRY. 


TaBLE 2 
Half-replicate (41 trials) of 34 design obtained by confounding A2B:C2D, 


aybicrd ab,¢,d3 axbcidy aab,C:d2 aybecid2 AzbeCod 
asbic,d; asb\c\d; AabeC xd; asbsCid2 Asb2C\d2 
aybscid; ay,bstids arbeeid3 dabiCsd2 aybeesd2 
asbscid; asbscid3 AadeC xls A2bsC3d2 AsbeCad2 
a,bycsd, a,biesds A2b,C2d) aybeeed; a,biCode 
asb:Cs0) asb,cad3 aabsCod, AsbeC2d) a3b,Cod2 
aybscsd; aybscads AsbiC3d3 Abetads Qi bxC2d2 
Gaba; aabscads =» AabaCods = Asbatads §=— saad 


tions into two groups. By analogy with the conventional methods of confounding, 
it seemed of interest to analyse the properties of a fractional design obtained 
by taking one of the halves resulting from such a dichotomy. It turns out that 
if the component representing the interaction between the quadratic effects of 
all the factors is confounded, then the resulting fraction is identical with that 
obtained by the omission of alternate hyperspheres. 

To take the example of a 3* design, let the treatment combinations on this 
occasion be represented by the codes a,b,;c,d, where the suffixes can assume the 
values 1, 2 or 3 representing the three levels of each factor. Then the contrast 
representing the interaction between all quadratic effects i.e. A,B,C.D,. in the 
standard notation, is given by the expansion of the expression 


(a, — 2a, + a3)(b, — 2b, + bs)(c — 2c, + ¢s)(d, — 2d, + ds) 


The terms in this expression with positive coefficients represent the fraction 
that is required and they are listed in Table 2. If this table is now converted 
into our previous notation in which levels are represented by the digits —1, 0 
and 1, then Table 3 results, which is easily confirmed to be the design obtained 
by omitting the first and third hyperspheres. 

To give a balanced design in which all three levels of each factor occur an 
equal number of times, it is necessary to duplicate the centre point, giving a 
design of 42 trials. The design properties will now be analysed and the design 
shown to have reasonably high orthogonality between the more important 
effects. To simplify the discussion, the case of quantitative factors with equally 
spaced levels is chosen. It is convenient for the purposes of demonstrating 
these properties to assume a model which is a linear function of orthogonal 
polynomials of the factor levels. Two-factor interactions will be assumed to be 
the only ones of any importance or magnitude. This is a plausible assumption in 
many experimental situations. The linear effects of the four factors will be 
denoted by the symbols A, , B, , C, and D, . These are in fact the regression 
coefficients in the model, of the four orthogonal polynomials a, , b; , c, and d; , 
which each take the values 


—1 at the first level of the factor 
0 at the second level of the factor 


1 at the third level of the factor 





NEW FRACTIONS OF FACTORIAL EXPERIMENTAL DESIGNS 


TABLE 3 
Half-replicate (41 trials) of 3‘ design obtained by omitting first and third hyperspheres 





Centre-point 


(0, 0, 0, 0) 
2nd hypersphere 


— 
- 


1, 


-1, 0) 
a 


1, 0) 
i, 1, 0) 
a 
&é &=n 
0, 0, -1) 
eS “x2 
0, 0, 1) 


_ 
| 
_— 
- 


oS 
L 


| 
_— 
- «< 
| 
~ 
- 


~ 
ore eS 
~ 


~ 


~ 


~ 
- 


~ 


. 


See 
3 . 
| 


~ 


ossesr 
Fe a ae i i 
| 


- 


LQ LYLE LOLOL OO 
| 
- 
| 

ee OOS 


~ 


cco 


~ 
. 


4th hypersphere 


| 
pa 
- 


=, <§) 
=$, <j) 
fi, —3) 
—1, -1) 
i, 1) 
i, =) 
1 =8) 
1, ~1) 


~ 


| | 
— et Ot Ot 
- 


- 


~ 
~ 


“ 
~ 


~ 


~ 


( 
( 
( 
( 
( 
( 
( 
( 


LAA OI N 
et tt 


~~ 


Similarly the quadratic effects are denoted by the symbols A; , B, , C, and D, , 


these being the regression coefficients of the orthogonal polynomials a, , b2 , 2 
and d, which each take the values 


1 at the first level of the factor 
—2 at the second level of the factor 
1 at the third level of the factor 


Interaction effects are denoted by symbols of the type (A;B;)(z, 7 = 1, 2) and 
are the coefficients of orthogonal polynomials a,b; which are obtained simply 
as the product of a; and b; . (Note that in contrast, the symbol (A;B;) does not 
represent the product of A; and B;). To avoid writing an excessively long equa- 


tion, the complete model is illustrated for the case of two factors by the following 
equation 


y= m+ A,a, + Bib, + Ard, + Bobo + (A,Bi)a,b, + (A.B,)a2b, 
+ (A,B,)a,b. + (A2Bz)a2b, + € (2) 


where y is the observed response, m is a constant term and e is the random 
error term. This equation is easily generalised to cases with more than two factors. 

A least squares analysis of the design under consideration using this model 
shows that as would be expected from the method of construction, the inter- 





364 R. E. FRY 


actions between the quadratic factor effects are heavily confounded according 
to the scheme, 


(A,B.) = (C.D,), (A,C2) = (B,D), (A,D,) = (BC). 
However experience shows that interactions between quadratic effects are 
most often negligible. If they are omitted from the model, it is found that the 
only correlation left is between the quadratic components of the main effects, 
and is of a very low level, the correlation coefficient between the least squares 
estimates of any two such effects as A, and B, being only —(1/9). All other 
components in the model are orthogonal to each other and to the group A, , 
B, , C, , D, . Thus with the exception of this latter group of effects, the estimates 
of regression coefficients and their corresponding sum of squares are computed 
in the conventional manner used for completely orthogonal designs. In the 
case of the group A, , B, , C, and D, , the normal equations for the estimates 
A, , B, , @, and D, turn out to be 


> ya, = 844, + 128, + 120, + 12D, 
> yb. = 12A, + 84B, + 12€, + 12D, 
> yer = 124, + 12B, + 84€, + 12D, 


> yd, = 12A, + 128, + 12C, + 84D, 


where summation is over all the observations. Equations with this particular 
type of matrix of coefficients are easily solved by the use of the following lemma. 
Lemma: If M is a symmetrical square matrix of order n with all elements in 
the main diagonal equal to u and all other elements equal to v, then M~* has 


all elements in its main diagonal equal to w where 
w= [ut (n — 2)v)/[u + (nm — 1)el[u — v] 
and all other elements equal to x where 
xz = [—v]/[u + (m — Ip)fu — 2] 
In the case of the normal equations given above, the inverse of the matrix of 
coefficients is as follows 
[ 9/720 —1/720 —1/720 —1/720 
| —1/720 9/720 -—1/720 —1/720 
—1/720 -—1/720 9/720 —1/720 
—1/720 -—1/720 -1/720 9/720 
This shows the estimating formulae are of the type 


A, = (1/720)(9 D0 yas — Di ybe — Di yee — Di yds) ete. 
Moreover since from standard regression theory the elements of the inverse 
matrix when multiplied by the residual variance of the model, give the variances 
(main diagonal) and covariances of the estimates, it is easily verified that the 
variance of each estimate is (9/720) o°(c” = residual variance) and that the 
correlation coefficient between any two estimates is —(1/9). The residual 
variance is estimated by difference from the analysis of variance table as shown 




















NEW FRACTIONS OF FACTORIAL EXPERIMENTAL DESIGNS 


TABLE 4 
Analysis of Variance of a Half-replicate of a 34 design 


Source of Variation Degrees of Freedom 





Sum of Squares 


Ai 1 A, zyq, 

B, 1 By Zyb; 

Ci 1 Ci Tyci 

Di 1 D, Zyd: 

Az, Bz, C2, D2 + A; Zya2 + By Zybs + 
C; ZyCe + DB, Lyd: 

(A,B) 1 (A,B) Zya, b, 
(A;B2) 1 (A,B2) Lyq be 
(A2B,) 1 (A2Bi) Zyd2 bi 


etc. ete. 





Error 15 By difference 


Total 41 Ly? — (1/42) (Zy)? 


in Table 4. To reduce the size of the table, only one interaction is listed, to 
indicate the general method of calculation for the others. The estimates of the 
orthogonal effects are calculated in the standard fashion 


e.g. (A,B,) = =. ya,b./ >, (a,5.)” = - ya,b,/52 
In view of the low correlation between A, , B, , C, and D, there is little practical 
value in constructing a simultaneous confidence interval, and it is sufficient to 


calculate individual confidence intervals for each effect. For example, the 95% 
confidence limits for A, are 


A, + (2.131)(0.118)¢ = A, + (0.238) 


where 2.131 is the 95% point of Student’s ¢ with 15 degrees of freedom, and 
0.11180 is the standard error of A, . If such confidence limits do not enclose 
zero, then the estimate is said to be significantly different from zero at the 
5% level. 

It is not proposed to discuss here the modifications to the above analysis 
when factors are quantitative with unequally spaced levels, or when they are 
qualitative. These are easily deduced from standard theory and are not relevant 
to the main discussion. 

In an interesting paper by Box and Behnken [10], some fractions of 3” designs 
are derived by an ingenious use of the properties of balanced incomplete block 
designs. Inspection reveals that their designs consist in every case, of the whole 
or a part of a single hypersphere, together with some replications of the centre- 
point. Their approach has the advantage that it leads directly to a method for 
blocking the designs. It also indicates how for more than 5 factors, a fraction 
of one hypersphere can be selected to give an efficient second-order design. 
Before concluding the discussion of fractions of 3" designs, it may be noted 




































































































































366 R. E. FRY 


that R. M. DeBaun recently investigated [11] a wide variety of symmetrical 
fractions of a 3° design. DeBaun was interested in the properties of these frac- 
tions as response surface designs rather than as conventional factorial designs, 
for which they would be rather too small. Nevertheless it is relevant to notice 
that of those fractions constituting less than 20 (out of 27) treatment combina- 
tions, the one with the best orthogonality properties was, in DeBaun’s notation, 
the cuboctahedron plus 4 centre-points. This turns out to be the design ob- 
tained by omitting the first and third hyperspheres. The model assumed in this 
instance was of the second degree, i.e. it included only effects of the type A, , A, 
and A,B, . DeBaun combined this fraction of 16 trials with some (fully repli- 
cated) two-level factors thus giving a fraction of a 2”3” design. In the following 
section, the alternative method of obtaining fractions of 2”3” designs will be 
discussed. This is the method of selecting standard fractional replicates from 
each hypersphere, rather than the omission of complete hyperspheres. 


3. FRACTIONS OF 2”3" DESIGNS 


Equation (1) demonstrates that in the hypersphere model of a 3” design, each 
hypersphere consists of a complex of 2* designs. One corollary of this is that if 
the design is now extended by the addition of some 2-level factors to make a 
mixed-level 2”3" design, then these extra factors can be regarded as extensions 


TABLE 5 
Half-replicate of a 2° 3* Design 





Centre 
—1,;  =1, 0) 
1, -1, 0) 
tly, he 0) 
 - 0) 


First Hypersphere 


0) 
0) 
0) 
0) 
0) 
0) 
0) 
0) 


- 


esses 


~ 


~~ 


22 LER LEO OLN LO 

ae ee ee ee 
iz é 

— 


Second Hypersphere 








=f) 
—1) 
—1) 
—1) 
—1) 
—1) 
—1) 
—1) 





NEW FRACTIONS OF FACTORIAL EXPERIMENTAL DESIGNS 


TABLE 6 
Matrix of Coefficients for the Normal Equations in a Half-replicate of a 2°3? Design 








The symmetric matrix of coefficients is of the form 
[A 0 0 0 

A oO 0 

A 0 

B 


where O is everywhere a null matrix, 


and where 
Column Identifications 
A=|36 4 0 0 A B C 


36 16 16 (BC) (AC) (AB) 

72 #0 (AP.) (BP.) (CP2) 

72 (AQ.) (BQ:) (CQ:) 

Column Identifications 

P, 
P, 
Q: 
Q: 
(AP) 
(BP) 
(CP) 
(AQ,) 
(BQ:) 
(CQ,) 
(P1Q,) 
(P2Q:) 
(P:Q:) 
(P2Qz) 


(all off diagonal elements 
are zero) 


to the 2* designs already existing on each hypersphere. In the algebraic terms 
of equation (1), this is represented by 


2"3" = 2"(1 + 2)" = 2" + n(2""") + (Ran err 6 


Since there are now 2* designs (with k > 0) everywhere from the centre to the 





368 R. E. FRY 


outermost hypersphere, the method of fractionation by selecting conventional 
fractional replicates of these 2* designs, seems particularly suitable. Thus one 
might take a half-replicate of every 2* design on each hypersphere. This was in 
fact tried for a 2° 3° factorial and resulted in a design with quite reasonable 
properties. However it is possible to improve the design slightly by the following 
method. It has been shown earlier that in the 3” part of the design, the inter- 
action between the quadratic components of the 3-level factors is represented 
by the contrast between the centre-point plus the second hypersphere on the 
one hand, and the first hypersphere on the other. This suggests that the two 
half-replicates of the 2° part of the design should be allocated respectively to 
these two fractions of the 3’ part. 

The treatment combinations in the design are as shown in Tabie 5, the first 
three digits of each code representing the levels of the three 2-level factors 
(—1 or 1), and the remaining two digits representing the levels of the two 3-level 
factors (—1, 0, or 1). 

For the purposes of analysis, we may once again assume the orthogonal 
polynomial model containing only main effects and two-factor interactions. 
Let the three two-level factors be denoted by the letters A, B and C and the 
two three-level factors by the letters P and Q. The nature of the normal equa- 
tions obtained by a least squares analysis is indicated in Table 6, by the sym- 
metric matrix of the coefficients of the estimates. 

The matrix shows that correlation between effects occurs only within each 
of three similar groups of four effects as follows: 


A, (BC), (AP), (AQ2) 
B, (AC), (BP2), (BQ:) 
C, (AB), (CP2), (CQ2) 


These three groups are orthogonal to each other and to the remaining effects, 
which are themselves mutually orthogonal, and which may therefore be esti- 
mated in the conventional manner. Each group of four correlated effects has 
the same matrix of coefficients for the normal equations as is apparent from 
Table 6. The inverse of this matrix is given in Table 7. 

From this inverted matrix may be derived the estimating formulae, variances 
and covariances of the correlated effects, and the matrix of correlation co- 
efficients shown in Table 8. 

The low values of the correlation coefficients in Table 8 show the design to be 
a practical one with a reasonably unambiguous interpretation. Further details of 


TABLE 7 


Inverse of the matrix of the normal equations for a typical group of four correlated effects in a 
half-replicate of a 2* 3? design 


(BC) —0.003906 0.035156 —0.007813 —0.007813 
(AP) 0.000868 —0.007813 0.015625 ae | 
(AQ) | 0.000868 —0.007813 0.001736 0.015625 


A 0.028212 — 0.003906 0.000868 eee | 





NEW FRACTIONS OF FACTORIAL EXPERIMENTAL DESIGNS 369 


TABLE 8 
Correlation coefficients of a typical group of four correlated effects in a half-replicate of a 2* 3* design 


A (BC) (AP2) (AQ2) 


A —0.1240 0.0413 0.0413 
(BC) 1.0000 —0.3333 —0.3333 
(AP2) 1.0000 0.1111 
(AQ) 1.0000 


the analysis follow exactly the same lines as in the design discussed in Section 2 
and will not be given here. The only feature that need be mentioned is that the 
design leaves nine degrees of freedom for the residual variance. 


4. A Haur-Repuicate or A 2° 3° Desien By W. S. Connor 


W. S. Connor has recently published [3] an example of his general method 
of fractionating 2”3" designs. The example is, like the one given in Section 3, 
a half-replicate of a 2°3° design. Briefly, the method by which it is constructed 
is as follows: The 2° part of the design is split into two conventional half-repli- 
cates S, and S, . The 3’ part of the design is split into three conventional one- 
third replicates S{ , Sj and Sj . The final design can then be represented sym- 
bolically by 


SiSi + S82 + S283 


where S,S{ , for example, represents all possible combinations of treatment 
combinations in S, with treatment combinations in S{ . This gives a design 
of 36 out of a total of 72 trials. Connor analyses his design using the same or- 
thogonal polynomial model as used in the designs discussed previously, and 
shows it to have a quite remarkable degree of orthogonality. Assuming a model 
with only main effects and two-factor interactions, the only correlations between 
effects are as follows, where the symbol ~ denotes correlation. 
A~BOC BrAC C~AB 

The correlation coefficient between each of these pairs of estimates is (1/3). 

It is not easy to generalize about how high the correlation between the least 
squares estimates of two regression coefficients needs to be before it becomes 
objectionable. One point can be mentioned. In cases of very high correlation, 
the linear functions of the observations which constitute the estimates will 
contain a small group of observations (in which resides the limited amount of 
orthogonality) which will be much more heavily weighted than the remainder. 
The estimates will consequently be much more vulnerable to the effect of “wild”’ 
observations which may occur within this small group. However this aspect 


is not of any practical importance in the two designs put forward in this paper, 
or in Connor’s design. 


5. Tue PossisBiLity oF FURTHER DEVELOPMENTS 


Only two small examples of fractional designs have been discussed in preceding 
Sections, but it is clear that there is considerable scope for investigation of 





370 R. E. FRY 


other designs derived by the same basic principles. For example, in the larger 
3” and 2”3” designs, one could investigate the effect of removing sub-sets other 
than the sub-set of alternate hyperspheres. Also, the effect of selecting quarter- 
replicates (and higher fractions) of the 2* designs within the hyperspheres could 
be tried. Various combinations of both techniques are also possible. In addition, 
the idea of obtaining a “half-replicate” of a 3" design by confounding a single 
degree of freedom component of a high order interaction, might be extended to 
the case where “‘quarter-replicates” (or higher fractions) are obtained by con- 
founding two or more such components. Further investigations into the above 
mentioned possibilities are proceeding, and it is hoped to publish the results at 
a later date. 

It is almost certain that there are many ways of obtaining new systematic 
fraction of factorial designs with reasonably simple properties and low correla- 
tions between the more important effects. Such designs, listed together with 
full information on the design properties (such as estimating formulae, standard 
errors of estimates, correlation between estimates, analysis of variance tables) 
would be almost as easy to use as the more restricted array of completely 
orthogonal designs at present available. The next few years are likely to usher 
in a considerably more flexible approach to the concept of factorial experimental 
design. 


ACKNOWLEDGMENTS 


Thanks are due to the Central Electricity Generating Board for permission 
to publish this paper, and in particular to Mr. W. J. Allum for his encourage- 
ment and many discussions during the work leading up to this paper. 


The writer is also grateful to the referee for many useful comments, and 
especially for advice which led to an improvement in the design described in 
Section 3. 


REFERENCES 


. Finney, D. J. (1945). The fractional replication of factorial experiments. Ann. Eug., 12, 
291-301. 

. Kemprsorne, O. (1952). The Design and Analysis of Experiments. John Wiley & Sons 
Inc., New York. 

. Connor, W. S. (1960). Fractional factorial experiment designs of mixed 2” 3” series. 
Ind. Eng. Chem., 52 (No. 6) 69A-71A. 

. Bosz, R. C. anp Connor, W. S. (1960). Analysis of fractionally replicated 2 3" designs. 
Proc. I. S. I., 37, 3-22. 

. Connor, W. S. anp Youna, Surrey (in press). Fractional factorial designs for experi- 
ments with factors at two and three levels. Natl. Bur. Standards, Appl. Math. Ser., U. S. 
Government Printing Office, Washington 25, D. C. 

. SATTERTHWAITE, F, E. (1959). Random Balance Experimentation. Technometrics, 1, 
111-137. 

. Bupng, T. A. (1959). The application of random balance designs. T'echnometrics, 1, 139-155. 

. Youpsn, W. J., Kempruorng, O., Tuxsy, J. W., Box, G. E. P., anp Hunter, J. S. (1959). 
Discussion of the Papers of Messrs. Satterthwaite and Budne. Technometrics, 1, 157-193. 

. Morrison, M. (1956). Fractional replication for mixed series. Biometrics, 12, 1-19. 

. Box, G. E. P. anp BEHNKEN, D. W. (1960). Some new three level designs for the study of 
quantitative variables. Technometrics, 2, 455-475. 

. De Baun, R. M. (1959). Response surface designs for three factors at three levels. 
Technometrics, 1, 1-8. 





TECHNOMETRICS Aucusrt, 1961 


A Study of the Group Screening Method 


G. S. WaTson 


Research Triangle Institute and University of Toronto 


This paper discusses the problem of group screening methods wherein f factors 
are sub-divided into groups of k factors each, forming g “‘group-factors’”’. The group 
factors are then studied using a Plackett and Burman design in g + 1 runs. The 
two versions of the group factors are formed by maintaining all component factors 
at their upper and lower levels respectively. All factors in groups found to have a 
large effects are-then studied in a second stage of experiments. The author discusses 
the problems of detection and false detection of factors, optimum group size, size of 
program, and the role of costs in this sequential form of experimentation. 


1. INTRODUCTION 


At the beginning of an investigation, a large number of factors that can affect 
the response may be suggested. It is usually found that very few of these factors 
do actually have any appreciable effect. Experimental designs for finding the 
few effective factors out of a large list of possible factors have been called Screen- 
ing Designs. The idea of putting the factors in groups, testing these group- 
factors, and then testing the factors in the significant group-factors, was sug- 
gested to the writer by W. S. Connor. Methods of this kind will be called Group 
Screening Methods. 

An excellent appraisal of the screening problem has been given by Box (1957). 
Thinking in terms of continuous primary variables and a response surface, he 
suggests that, in this problem, the surface is of constant height in all directions 
except in those of several of ‘the factors’—and these are the ones we wish to 
discover. The problems are to find what functions (singly or jointly) of the 
primary variables should be taken as ‘the factors’ and to choose the levels of 
these factors. Having surmounted these difficulties Box advocates the use of 
orthogonal two-level designs because the experiment may then be ‘blocked’ 
and carried out in an informal sequential manner. In the present paper, an 
identical strategy is adopted but different tactics are used. 

The considerations of this paper stem from the analogy of this problem with 
the biological problem.of the detection of a rare defect among the members of 
a large population, Dorfman (1943) [a more convenient reference is a problem 
in Feller (1957), p. 225]. Suppose this malady may be detected with certainty 
from a minute sample of blood. Dorfman suggests that pooled blood samples be 
tested. The individual samples that formed a pooled sample are tested whenever 
the latter gives a positive result. A 100% screening may be achieved with sub- 
stantial saving by choosing the correct group size. 

Greater savings may be obtained by using schemes with more than two- 
stages. An extensive study of such schemes has been made by Sobel and Groll 
(1959) and each might be made the basis of a group-screening design. It is 


371 





372 G. S. WATSON 


relevant to remark that they had interesting industrial, rather than biological 
applications in mind. In this initial investigation it seems best to study only 
the simplest plan, namely that suggested by Dorfman, because there are real 
difficulties in converting these binomial plans into continuous screening plans. 

Although the intrinsic nature of the screening problem makes satisfactory 
assumptions difficult at the start, the approach below will suggest that efficient 
screening can be carried out by using group screening methods even when some 
of the assumptions made initially are not satisfied. Since the problem is not 
precise, it would be pointless to suggest any rigid procedures. 


2. THE PERFORMANCE OF Group SCREENING DESIGNS 


Suppose that f factors are to be tested for their effect on the response. Initially, 
we will assume that 


(i) all factors have, independently, the same prior probability of being 
effective, p(q = 1 — p), 
(ii) effective factors have the same effect, A > 0, 
(iii) there are no interactions present, 
(iv) the required designs exist, 
(v) the directions of possible effects are known, 
(vi) the errors of all observations are independently normal with a constant 
known variance, o°, 
(vii) f = gk where g = number of groups, and k = number of factors per 
group. 


These stringent assumptions are made only to provide a simple initial frame- 


work. They will be discussed later and weakened. Actually they are not as 
limiting as they appear. 

In (ii), an “effective factor’’ is defined to be a factor which produces a non- 
zero change in the mean response; because it is shorter, the term “‘real’’ is some- 
times used here instead of “effective.” With these assumptions, the f factors 
may be divided into g groups of k factors each, by any method. By (v), we can 
call the upper level of each factor the level which may lead to a larger response. 
If each group of factors is called a group-factor, the upper level of a group-factor 
will be defined when all its component factors are at their upper levels. The 
lower level of a group-factor is obtained by putting all its factors at their lower 
levels. In this way, and with the addition of (iii), there is no chance of cancellation 
of effects. By (iii) and (iv), we suppose that an experiment is run with these g 
group-factors which enables their main effects to be unbiasedly estimated in 
g + 1 runs. Thus the optimum multifactorial designs of Burman and Plackett 
(1946) will be used. These exist only when g + 1 is divisible by 4. In simple 
cases these are fractions of complete 2-level factorials. These designs use only 
two levels and are symmetrical with respect to all factors which seems reasonable 
for screening. All the factors in significant group-factors are then tested in a 
second stage experiment. 

The “prior probability of a factor being effective” will be a contentious idea 
to many. It is imagined here that it will summarize the degree of belief of those 
who are concerned with the experiment. But, we assume below that it may 





A STUDY OF THE GROUP SCREENING METHOD 


TABLE 1 
Definition of Group Factors 








Group Factors 





Levels (4, By C) +z (D; BF): Y (G, H,1):Z 
Lower Level (0, 0,0): 1 (0, 0, 0) :1 (0, 0,0):1 
Upper Level (i; 1, 3) 42 (i Fb) se (Litjis 


be handled like a probability. There are other ways of formalizing the essential 
idea that separates the screening problem from the classical factorial design 
problem. The method of this paper allows the analogy with the binomial plans 
mentioned in Section 1. If, for example, it is supposed that a small but known 
number f, of the f factors are effective, a different approach is called for, since 
the problem is now one of selection of a fixed number f, of factors. Grouping 
and a multistage approach could still be used. This will be discussed in another 
paper. 

For example, suppose that there are f = 9 factors, denoted by A, B, --- , J, 
and that they are divided into g = 3 groups of k = 3 factors each, to form. 
group-factors (A, B, C), (D, E, F), (G, H, I). These may be denoted by X, Y, Z. 
Then, if the upper and lower levels of the factors A, B, --- , J are defined so 
that their possible effects are to give, separately, greater and lesser values to the 
response, the upper and lower levels of the groyp-factors are defined by Table 1. 
This is possible by assumptions (v) and (vii). By (iii) and (v), any group-factor 
containing one or more effective factors will produce a non-zero change in the 
mean response since no cancellations can take place. To test for the main effects 
of these three group-factors, suppose we use a } replicate of a 2° design defined 
by the factor combinations x, y, z, xyz. In terms of the original factors, the 
factor combination x, for example, is 


(A, B, C, D, E, F, G, H, I) 
z:(1, 1, 1, 0, 0, 0, 0, 0, 0). 


Using these four runs only, it is clear that the main effect of the first group- 
factor X is aliased with the interaction of the group-factors Y and Z. By assump- 
tion (iii), the latter effect is non-existent so that a significant group-factor implies 
that one of its factors has a main effect. This test will be called the “first stage.” 
Suppose that the first stage test gives only the group-factor X as significant. 
Then in the “second stage,” the factors A, B, C will be tested. Because of num- 
bers of factors involved, the same design will be used here as was used in the 
first stage, that is, a 4 replicate of a 2° confounding the ABC interaction. As- 
sumption (vi) ensures that the ordinary analysis of variance procedures may 
be used to analyze these tests, although the author would prefer a more Bayesian 
method of inference. 

In this second stage experiment, it may be decided to put all the factors 
D, E, --+ , I at their upper levels while varying A, B, C to get, in an obvious 
notation, the factor combinations a, b, c, abe. Then abe is actually the same as 





374 G. S. WATSON 


XYZ for the levels of all the factors A, B, --- , J. If there is no objection to using 
the response from the earlier run, only 3 additional runs would be required. 
Similarly, if in the first stage the factor combinations 1, zy, yz, zz, had been 
used, and D, E, --- , I set at their lower levels for the second stage, the choice 
there of the fraction 1, ab, be, ca, would again allow the saving of a run. For 
reasons of simplicity in the mathematics and because it may often be possible, 
we will assume in the theory that g + 1 runs are used in the first stage and nk 
runs in the second stage, where n is- the number of significant group factors. 

Had two group factors, say X and Y, shown up as significant in the first 
stage, a design difficulty would arise because we would wish to test, in an or- 
thogonal 2-level design, 6 factors in 6 (or 7) runs. This is impossible and in 
practice one would use a Burman and Plackett design for 6 factors in 8 runs. 

In such a screening experiment, we want (a) to detect as many of the effective 
factors as possible, (b) to declare effective as few non-effective factors as possible, 
(c) to achieve these aims with as few runs as possible. The method of grouping 
and the group size, and the significance levels of the first and second stage tests 
are at our disposal. With the simple assumptions above, it will be shown below 
how these quantities affect the performance of the experimental plan. The 
performance will be good if criteria (a), (b), (c) above are, in some sense, satisfied. 

In Section 3, a simplified discussion will be given where experimental error 
is ignored. This means that A is so large compared with o that all effective 
factors will certainly be detected and so the only problem is how to carry out 
the experiment with the minimum number of runs. 

In Section 4, the effects of error are introduced again. Both Sections 3 and 4 
are the theoretical consequences of the assumptions. In Section 5, the results 
so obtained are examined from a practical point of view. This shows what as- 
sumptions are important and what modifications are necessary for the applica- 
tion of group screening designs. Section 6 gives a summary of the results obtained. 


3. Case or ZERO Error VARIANCE 


An effective group-factor is a group-factor containing at least one effective 
factor. By assumption (i), a group-factor which contains k factors is effective 
with probability 1 — g*. When o = 0, the chance that an effective group-factor 
will be declared effective is unity since we have assumed by (iii) and (v) that 
no interactions or cancellations will mask its effect. Hence, with g group-factors, 
the expectation of the number of effective group-factors, n, is 


E(n) = g(l — ¢'). (3.1) 
The probability that a factor is effective when it has been selected at random 
from a group-factor known to be effective is 


bac epee ‘ 


Since all the single factors in group-factors found effective in the first stage 


will be tested in the second stage, the expected number of effective factors 
found is 


En) X kp’, (3.3) 





A STUDY OF THE GROUP SCREENING METHOD 375 


which by (3.1) and (3.2) is fp, as it obviously must be. This, however, will not 
be so in Section 4. 

The number of runs taken is g + 1 in the first stage and nk in the second stage, 
where n has the binomial distribution, b(n; g(1 — q*)), using the notation of 
Feller (op. cit.). Thus, the total number of runs R made has a mean given by 


ER) = kgl — g') +9 +1, 


E(R) = i(1 -¢+ ; + ‘). (3.4) 
The equation (3.4) is the same as Dorfman’s except for the addition of the last 
term, 1. To find the value of k which will minimize E(R) for fixed f and p (or q), 
E(R) must be computed for all integral values of k and the k, making E(R) 
least, chosen. For an approximate solution, obtained by allowing k to vary 
continuously, a transcendental equation must be solved. This value of k will 
be the optimum group size. 
Dorfman defined the Relative Testing Cost as 


Expected — of Tests 100, (3.5) 


and the first three columns of Table 2 give his calculations of this quantity and 


the optimum group size for a range of values of p. In our case, the correct defi- 
nition would be 


E(R) 
7+1™: 


(3.6) 


TABLE 2 
Optimum Group Sizes and Relative Testing Costs for Selected Prior Probabilities 


Prior Probs. Optimum Group Relative Testing 1 — g* 1 — q* — kpq®"} 
Size, k Cost, % 


1 20 
27 
33 
38 
43 


47 


CWwWwwr PPP OOO a De 





G. S$. WATSON 


Runs per hun- 
dred factors 


Size of Group 


Figure 1—Economies resulting from screening by groups (p denotes the prior probability) 


or 


H(i - a + $+ 400 


whereas column three of Table 2 gives 


(1 —qt+ +) 100. (3.8) 


The error in using (3.8) instead of (3.7) should be slight and the gain in simplicity 
in being able to ignore f makes the use of the approximate formula (3.8) worth- 
while. 

The last two columns in Table 2 show respectively the chance that an optimum 
sized group-factor will contain at least one and at least two effective factors. 
The integral changes in k cause non-smooth variation in these columns. It will 
be noticed that for p < 0.15, there is less than 0.07 chance that more than one 
effective factor will appear in an optimum sized group-factor. This means that 
the possibility of factors canceling or interacting within a group-factor is not 
great in optimum sized groups. Interactions between group-factors are still 
a danger and are a reason for retaining assumption (iii). 

Figure 1 shows the expected number of runs per hundred factors plotted 
against the group size, for various prior probabilities. 

On the assumption that p is small, it is possible to derive some useful formulae. 
The approximate optimum value of k satisfies the equation, 


—q' log. q — ~ =0, (3.9) 
if k is allowed to vary continuously. When p is small, 


log. g = log. (1 — p) ~ —p, 
q = (1 —p)*~1-— kp, 





A STUDY OF THE GROUP SCREENING METHOD 


so that (3.9) becomes 


1 
(1 — kp)p ~ ie 


Thus, as p — 0, 


optimum & ~ —L. (3.10) 


a 


To the same order of approximation, it may be shown for optimum group 
sizes that 


E(R) ~ 2g + 2 (3.11) 
and that 


Prob {more than one effective factor in a group-factor} ~ p/2. (3.12) 
From (3.11), 


Relative testing cost % ~ a. (3.13) 


k 
It will be seen that the approximations (3.10), (3.13), (3.12) give the second, 
third and fifth columns of Table 2 with adequate accuracy for p up to about 0.10. 


4. Case or Non-ZeErRO ERROR VARIANCE 


In this section the analysis of Section 3 is extended to take account of the 
experimental error. There is now the possibility that effective group-factors 
and factors will not be detected and that ineffective group-factors and factors 
will be declared effective when they are not. As in Section 3, the analysis is 
based on the assumptions of Section 2 and so is again quite theoretical. The 
following intuitive comments should be made more explicit by the analysis below. 

If the significance tests at the end of the first stage are at level a and those 
at the end of the second stage are at level 6, it is clear that an increase in a will 
usually increase the number of runs made, decrease the number of real factors 
that are missed, and increase to a lesser extent the number of unreal factors 
that may be declared real. These are opposing tendencies. An increase in 8 does 
not affect the average number of runs but does increase both the number of 
real and unreal factors that are declared real. Finally, because of the “hidden 
replication”’ in factorial experiments, the powers of the tests at both stages will 
be increased by using more groups and by having more factors to test in the 
second stage, that is, by increasing g and nk. Since f is fixed and f = gk, these 
are conflicting requirements. In order, therefore, to choose a “good’’ plan, 
some balancing of conflicting objectives will be required. This would most likely 
be provided by economic considerations. 

To introduce the errors and the size of the effects, we must consider the powers 
of the first and second stage tests. In the first stage, g + 1 runs or observations 
are made. Since we are assuming that optimum multifactorial designs always 
exist and are used, the variance of the estimate (defined in the usual way with 





378 G. S. WATSON 
divisor g + 1) of the main effect of each group-factor is 
o°/(g + 1), (4.1) 


while the mean value of the estimate is, for s = 0, 1, --- , k, 
s A with probability ("rat (4.2) 


There may be some readers who would prefer to define the mean effect of a 
group factor as 


E(s A) = kp A. 


Since only a single degree of freedom (d.f.) is associated with the estimate 
of an effect, a ¢-test will be used. By assumption (vi) the denominator df. are 
infinite but we prefer to speak ot t-tests because, later, d.f. may be introduced 
to account for possible variation in c. If sA is the mean effect of a group-factor, 
the power of a t-test of it will be 


3 = (8d, ’ a), (4.3) 

with 
1A 
= 224s, (4.4) 


where ¢, is the parameter used in Table 10 of Hartley and Pearson (1954). 
Hence, the average chance that a group-factor will be declared significant is 


k 


rf = X ("\prat-as(ob , a). (4.5) 


Of course, the probability that an ineffective group-factor will be declared 
significant is 


7,(0, a) = a, (4.6) 


the level of the test. It will be noticed also that, as p — 0, x* — a. The average 
probability that an effective group-factor will be declared significant is 


: (;)'a lo + z (4.7) 


= 


1—q 
Since, for ¢, > 0, 

™(8d, a) < m((s + 1)d, , a), (4.8) 
we have 
7 > mr: , a), (4.9) 


so that 7,(¢; , @) is a lower bound to the power for an effective group-factor. 
However, since x{ — 7,(¢; , a) as p — 0, (4.9) should be almost an equality for 
small p. 

The power of the t-test made in the second stage, again assuming that an 









A STUDY OF THE GROUP SCREENING METHOD 
optimum multifactorial design is used, will be denoted by 
= T(z ? 8), (4.10) 


— [No. factor tested +1 1A (4.11) 


aa ‘ for an ineffective factor, 


with 


where 
1, for an effective factor. (4.12) 


The number of factors tested will be an integral multiple of k. 

With the above definitions, the performance of group-screening designs may 
now be examined. The number of group-factors declared significant will have 
the binomial distribution b(n; x* , g) which has a mean value 

E(n) = grt, 


g(agq’ + xi(1 — q')), 


so that 


En) > gag’ + mr ,a)(1 + @')), (4.13) 


with almost equality for p small. Furthermore, the probability, p’, that a factor, 


chosen at random from a group-factor which has been declared significant, is 
effective, will be 


p’ = pmi/rt , (4.14) 
and using (4.5) and (4.7) this may be rewritten 


(4.15) 


A comparison of (4.15) with (3.2) shows they agree when a = 0 and r{ > 0, 
a state of affairs that can only obtain when « = 0 and p > 0. Also the probability 
(4.15) is less than (3.2) since here a group factor may be declared significant 
when it contains no effective factors. 

The number M, of effective factors declared significant in the second stage 
tests has a complex distribution. Let m, of the effective factors, from the ith 
group-factor that was declared significant (any ordering), be declared significant. 
Given that n group-factors were declared significant, 


m, i 0(m, ;p’no( J F4 4 5) , x), n>0, (4.16) 


using the facts that real factors are detected with probability x. and that with 
probability p’, y = 1 in (4.11), (4.12). Then 





Mp = m+ m+ +--+ +™m,. (4.17) 
For fixed n, the mean value of M, is 


B(Ms | n) = nkp’r,(./*+4 4 , g). (4.18) 





































































































380 G. S. WATSON 


It is clear that M, does not have a compound binomial distribution because 
of the dependence of :r. on n. However 


: ,< ink +14 ; a 
E(M x) = kp ¥ ne,( fet = ,8)(2)xt (1 — x*)’”, (4.19) 
gives the expected number of effective factors detected by this group-screening 
plan. From a result similar to (4.8), 


(4.20) 
By (4.14), this may be written as 


pes ) (4.21) 


E(Mx) > fpr'n,(.|* 24 y 


o 


where fp is the expected number of effective factors present. E(M,)/fp is a 
measure of the efficiency of the design for detecting effective variables. Using 


the inequality (4.9) in conjunction with (4.21), we have shown that 
; k+1a 
E(M x) > fom. , am( == EG 8) x 


o 


If both the powers in (4.22) are high, (4.22) should be nearly an equality. The 
same should also be true when p is small. 

The number of unreal factors found, My , may be discussed in the same way. 
With q’ = 1-p’, 


E(M,y | n) = nkq’B (4.23) 
so that 
E(M vy) = kq'Bgnt , 
{B(x¥ — pri), . (4.24) 
by (4.14). Since 
(4.25) 
from (4.5) and (4.7), 


ee ae 
E(Mv) = fq8 =. (4.26) 


where fq is the expected number of unreal factors present. As p — 0, E(1/,y) 
tends to fag. 
Finally, the number of runs used, 


R=nk+9+1 


may easily be studied since n is b(n; r* , g) 
Thus, 


E(R) = fret +g +1 (4.28) 


‘ The prediction made in the second paragraph of ‘':is section may now be 





A STUDY OF THE GROUP SCREENING METHOD 381 


examined in the light of the above formulae. Since an increase in a@ increases 
x* , E(R) is increased, by (4.5) and (4.28). From (4.19), it is hard to be sure 
that an increase in a increases E(M,) but the lower bound to E(M,) given in 
(4.22) certainly increases. From (4.26), E(My) increases with increase in a. 
E(R) does not change at all with 8 but (4.19) and (4.26) show that E(M,) and 
E(M,y) do increase with 8. The extent of these movements can only be studied 
by numerical examples. 

The reason for the excessively simple assumptions of Section 2 is now evident. 
Even with them, the analysis is too complex for an easy assessment of the results. 

If a and ¢, are large enough, 7,(s¢, , a) for s = 1, 2, --- , & will be nearly 
unity so that, from (4.5), we have approximately 


mt =aq' + (1 — q’). (4.29) 
Substituting this approximation in (4.28) 
ER) ~ flag + (1 - @)} +941, (4.30) 


Thus the main alteration from the results of Section 3 comes from the term 
fag‘. Writing now for simplicity 


E(R) ~ i = «ae + 1} , (4.31) 


(4.31) is minimized for continuous variation in k by the solution of 
—(1 — a)q* log g — he = 0. 


When p — 0, this has the asymptotic solution, 


a 
V (1 — ap 


Thus the optimum group sizes are greater here than those given in Table 2. 
If a is never larger than 0.20, k is never increased by more than 12%. Since 
one would rarely contemplate groups of size greater than 10, this is no more 
than unity in the value of k. Considering the approximations involved and the 
difficulty of formulating the problem, this seems to indicate that Table 2 and 
Figure 1 may be used in practice, where experimental errors are not negligible 
compared with the sizes of the effects sought, if minimizing the number of runs 
is the prime objective. 

The analysis of this section reduces to that of Section 3 as ¢ > 0, oras A> ~, 
only if simultaneously a, 8 — 0. 

If a cost function is defined in terms of E(M,), E(My), and E(R), it could 
perhaps be minimized by choice of k, a and 8. 


(4.33) 


5. PRACTICAL IMPLICATIONS 


In this section the direct application of the results of Sections 3 and 4 is 
shown numerically. Furthermore, more general assumptions are examined; for 
in practice the assumptions of Section 2 are not all satisfied and this leads to more 
complex group-screening designs. 





382 G. S. WATSON 


To illustrate the direct application, the example of Section 2 with g = 3, 
k = 3 will be used. Initially it will be assumed that ¢ = 0 so that Section 3 is 
appropriate. From Table 2, this design is called for with f = 9 if .30 > p > .13. 
Suppose that, in a discussion between all the people in charge of the investigation 
of which this experiment forms a part, it is agreed that “all factors have about 
the same chance of being effective and it is expected that only one or two of 
them will be; interactions would be trivial compared with main effects.” *(To 
this will be added later “which are of size A, while the error variance is known 
to be o’.’’) It is not meant here by this that there are definitely one or two effec- 
tive factors. If so, the task then would be to select them, and a different theory 
would be required than that given here. Rather we suppose that this means 
there is a prior probability of between 1/9 = 0.11 to 2/9 = .22 of a factor 
being effective, i.e., having a main effect. For this reason we select an inter- 
mediate value p = 0.15 and write, by Table 2,9 = 3 X 3. 

Then the probabilities of finding 0, 1, 2, 3 significant group-factors are re- 
spectively 0.23, 0.44, 0.27, 0.06 when the effects are very large compared with 
o so that o is effectively zero. If o is zero, a factorial design is not required. In 
fact, for various numbers of significant group factors, the required total numbers 
of runs are as follows: 


Number of Significant Required Total 
Group-factors Number of Runs 
0 4 
7 


1 
2 10 
3 13 


Thus, the expected number of runs is 
4X 0.23 + 7 X 0.44 + 10 X 0.27 + 13 X 0.06 = 7.48. 


If all nine factors are run without screening, ten runs are required. Thus, the 
relative testing cost is 75 percent. This assumes that all required designs exist 
which is not true. The figure closely agrees with that in Table 2. 

Next, let only factorial designs be used at both stages. Roughly half the time 
the experiment would turn out as explained in Section 2. Roughly one quarter 
of the time, two group-factors would be significant, and so, in the second stage, 
six factors would have to be tested. In this case the second stage design could 
be based on a 2° design. If no use is made of the first stage results, this would 
leave one degree of freedom (d.f.) for error; if use is made of one run in the 
first stage, two d.f. are available. In about one case in twenty, all factors would 
have to be tested so that the first stage has been a total waste of effort. In this 
case a 2* design could be used leaving 6 d.f. for estimation of error in this case 
a useful number to check the prior value of o° or even to make the experiment 
self-containing or to check for the existence of interactions. According to formula 
(3.4) the expected number of runs is 8.3. The above more realistic assessment is 


4 X 0.23 + 8 X 0.44 + 12 X 0.27 + 20 X 0.06 = 8.9, 











A STUDY OF THE GROUP SCREENING METHOD 383 


which is slightly larger, because the designs have more runs than the theory 
envisages. Since 16 runs would be needed to test the 9 factors without grouping, 
the relative testing cost of the above plan is [8.9/16]/100 = 56% when competi- 
tion is restricted to two-level factorial designs. Of course, if the error is not 
confidently known, some of this efficiency is illusory. For adding on 6 runs to 
the group-plan so that it always has 6 d.f. for error, the expected number of runs is 


10 X 0.23 + 14 X 0.44 + 17 XK 0.27 + 20 X 0.06 = 14.2, 


and the relative testing cost is 89 percent, which brings the figure above that 
in Table 2, 72%. 

Finally consider the use of the best Burman and Plackett designs. The only 
change will be the use of a design in 12 runs for the case of all three group factors 
significant so that the expected number of runs is reduced to 8.66. The relative 
testing cost, within this den of designs, is [8.66/12] 100 = 72% the value in 
Table 2. 

This demonstrates, for effects large compared with co, that the reduction in 
the testing cost, due to using a group-plan, is not over-stated in Table 2, even 
when the only possible designs are larger than the simple theory here postulates. 
This being so, Figure 1 shows that if p is overestimated, the gain from group 
testing will be greater than is supposed. In the present case with k = 3, used 
because p was assessed at 0.15, the relative testing cost, if p is really 0.01, will 
be about one half of the figure in Table 2. If p is underestimated, the gain will 
be less than is supposed. Since it is likely that, in this sort of problem, p > 0.02, 
it would seem that group sizes between 3 and 8 only will be used. In practice, 
the estimate of p will never be very firm. Since all the relevant curves in Figure 
1 for p > .02 are increasing at k = 6, it is suggested that k be chosen only from 
3, 4, 5, 6. This is tantamount to increasing slightly estimates of p that are small 
or setting a lower limit to p. 

Before considering the errors due to testing when ¢ is not negligible, another 
important modification of the above theory in practice must be discussed. It is 
unlikely that all the factors would be given the same prior probabilities. If the p 
used above is the greatest of the prior probabilities suggested for all the factors, the 
analysis of Section 2 is conservative, i.e., for a given k, the expected number 
of runs will never be greater than that predicted and usually will be less. This 
will be true however the factors are put into groups of a fixed equal size. But 
it is obvious that the result does now depend on how they are put into groups 
and that groups of different sizes may be better than equal groups. In the simplest 
case, suppose the factors fell into two groups—f, with prior probability p, and 
fa(=f — f,) with prior probability p, . Arranging the first set into g, groups of 
size k, and the second into g2 groups of size k, , it is clear that 


E(R) = kgs — qi) + kg. — q2’) +a +g +1, (5.1) 


which is minimized by taking k, and k, from the rows of Table 2 corresponding 
to p, and p, . Thus it is best to put the likely factors together in small groups 
and the unlikely factors together in larger groups. Taking this idea to the limit, 
any factor with p > .3 will certainly be tested on its own. Table 2 may still be 
used to get a rough idea of the possible gains to be derived from grouping. If 






































































































































384 G. S. WATSON 


the minimum value of E(R) in (3.4) is denoted by E£,,,(R), then the minimum 
value of (5.1) is 


Es,.9(R) + E;,.»(R) — 1. (5.2) 


Since E,,,(R) is {/100 times Column 3 of Table 2, it may be found easily and 
used to compute (5.2) and hence the relative testing cost for this simple case of 
unequal group sizes. The same method deals with any number of subsets of the 
factors. 

This device of using different group sizes, when the prior probabilities differ, 
not only adds something to the efficiency of group-screening but it greatly 
simplifies the design problem. For f will usually not be divisible into equal 
groups of the required size—and the designs for the first and second stages may 
need more runs than the minimum assumed in the theory. For example, suppose 
there are { = 50 factors, of which about 30 have p ~ .02, about 10 have p ~ .07 
and about 10 have p ~ .12. Since for p = 0.02, optimum k = 8; for p = 0.07, 
optimum k = 5; and for p = 0.12, optimum k = 4, inspection of the factors 
may suggest that we can use the partition 50 = 4K 8+2X5+2 X 4, giving 
8 groups in the first stage. For the first stage, this would require a Burman and 
Plackett design with 12 runs and give 3 d.f. for error. If it were possible to use 
either of the partitions (with seven groups) 


50=5X8+1X6+1X4, 
5X8+2X5, 


then the first stage would use 8 runs. The nearest equal-group design would 
require that one of the factors be dropped. Then 7 groups of 7 factors each would 
be a very convenient design for the first stage, but, since 20 of the factors have 
p = .07, in the second stage one is likely to have too many factors to test. None 
of the possibilities with 7 groups are satisfactory because they require too many 
large groups, so that it would seem better to be conservative, i.e., to allow for 
possible understimation of the p’s, and use, 


s0=4X64+3X5+2xX44+1X3. 


This involves 10 groups and so one could use 12 runs for the first stage (leaving 


1 df. for error.) A similar efficiency will be obtained by putting the most likely 
factors together and using 


50 = 11 X¥4+2 X3. 


which has 13 groups, in the more familiar 2* design. Since on the above assump- 
tions we expect 2 or 3 factors to be effective, and since they are unlikely to 
occur in the same groups, we may, in this latter case, expect between 6 and 12 
factors to be tested in the second stage. Thus it is most unlikely that more than 
16 runs will be required there, making 32 in all. An ungrouped experiment with 
the 50 factors would have to use 2° = 64 runs, if one uses factorials, so that a 
50% saving should be the minimum with this last plan. An optimum factorial 
design using 52 runs could however be used, leading to a saving from grouping 
of 40%. These calculations could be made more precise by extending the argu- 
ments of Section 3. 





A STUDY OF THE GROUP SCREENING METHOD 385 


So far in this section, only the implications of Section 3 have been discussed. 
The results obtained, for the case of effects large compared with o, suggest that 


(i) factors should be grouped according to their p-values, 
(ii) a number of partitions of f should be considered, 
(iii) the best partition, if no error estimate is required, uses group sizes close 
to, but not greater than, the optimum sizes and the number of groups 
should be as near to, but at least one less than a multiple of 4. 


(iv) an estimate of the efficiency of any proposed design be made by (3.4) 
or its extensions. 


It must now be determined how non-negligible error variance modifies these 
simple grouping rules. In addition to the manner of grouping, a choice of the 
significant levels a and 8 must be made. Initially the discussion of Section 2 
will be further extended. 

With 9 factors in 3 groups of 3 and p = .15, it follows from the discussion at 
the beginning of this section that the essential quantities in Section 4 are: 


v25, 
o 


0.614a + 0.3257,(¢, , a) + 0.0577,(2¢, , a) + 0.0037, (3¢, , a), 
= 0.8447,(¢, , a) + 0.1487,(2¢, , a) + 0.0087,(3¢, , a), 


37 9.15) > nz V@n + 724 .8)(3) ee" — ry, 


* 
Ty n=0 


E(Mz) = 


E(M,) = 19.18{x* — 0.723(x* — 0.15a)}, 
19.18{0.277r* + 0.108a}, 
E(R) = 9n¥ +4. 


These quantities will be computed when A/o = V2, 2,3 fora = 0.01, 0.05, 
0.20 and 8 = 0.01, 0.05. Thus 


¢, = 2 and 2.83 and 4.24. 


In referring to Pearson and Hartley’s chart, the d.f. are taken as ©, ie., o 
is assumed known. Since we have assumed the direction of effects known, one- 
sided tests would be used so that a = 0.05 and 0.01 on the chart are really 
2.5 and 0.5% significance levels. As optimumsized groups, for p < .15, have 
only a chance 0.06 or less of containing two or more effective factors, we will 
in practice not need to know the directions of all the possible effects and so 
two-sided tests will almost invariably be used. Hence we will, in fact, take 
a = 0.05 and a = 0.01 as 5 and 1% levels. The powers for a = 0.20 were com- 
puted ab initio. The results obtained are shown in Table 3. 

An examination of Table 3 verifies the predictions at the beginning of Section 
4 and shows the extent of the changes. First we note that an increase in 6 from 
0.01 to 0.05 makes only a slight improvement in E(M,) but multiplies E(M zy) 
by 5. Since f is only nine these movements are all small. Thus it seems that the 
test at the second stage should be at a low level of significance. An increase in 





G. S. WATSON 


TABLE 3 


0.05 


E(Mp)* ‘ 75 95 
E(My) ‘ .02 03 
E(Mp) : .93 iat 
E(My) ; 10 15 

16 8.40 


E(Mp) 25 1.28 
E(My) 02 .03 
E(Mp) 31 1.33 
E(My) ll 15 

65 8.57 


E(Mp) : .35 1.35 
E(My) : . 02 04 
E(Mpr) . .35 1.35 
E(Mvy) F 11 .20 
E(R) : 74 10.22 


* The values for E(M,p) were computed using riV3n/2 A/a, 8) instead of w.('V (3n + 1/2) 
A/c, B). 


a increases the mean number of runs and the mean number of real factors de- 
tected. However, the change in a from 0.05 to 0.20 results mainly in an increase 
in the number of runs. All the movements disappear as A/o increases, except 
for an increase in the number of runs. The mean number of real factors expected 
to be present is 9 X .15 = 1.35. The mean number of detected real factors 
closely approaches this number for A/a > 2. As has been explained earlier there 
is no question of an optimum design without introducing costs. However these 
calculations show that the 3 groups of 3 plan is not satisfactory unless A/c > 2 
because a real factor has a good chance of being missed, as judged by the mean 
values. If A/o > 2, it could be used and a slight saving in runs would be achieved 
by keeping a down to 0.05. 

One final observation may be made. The values of E(R) increase as A/c in- 
creases, and they decrease as a decreases. In the limit, A/s — ~, a — 0, 
E(R) = 6.5, the value found from Table 2. If the relative testing cost is defined, 
as before, as 100 E(R)/f, then the relative testing costs are greater in the presence 
of error than without. But in this case E(R) should be compared with the number 
of runs required by some other design that gives the same E(M,) and E(My). 
So it is best to say simply that the E(R) will be larger in practice than is implied 
by Table 2. Provided a is not large, Table 3 suggests that E(R) should not be 
inflated by more than 15%. It was shown at the end of Section 4 that the opti- 
mum group size in the presence of error is slightly larger than it is without error. 
However it is hard to believe that this would reduce E(R) and counteract the 
above inflation completely. 








A STUDY OF THE GROUP SCREENING METHOD 387 





It is difficult to show how the presence of error affects the general recommenda- 
tions when the prior probabilities vary. If the group sizes are properly chosen, 
there will be approximately none or one effective factor in a group. Thus the 
x* for a group of factors with prior probabilities p,(¢q; = 1 — p,), of sizek; 
and of effect-size A; , is roughly 


ah = ag! + (1 — adn {2% a) , (5.3) 



















Eq) = Dek, 


t=1 











BR) ~ Dik + 9 +2. (5.5) 


These formulae are simple generalizations of (4.5), (4.12), and (4.28). More 
approximations are needed to get a suitable formula for E(M,) and E(My). Now 


E(M,) = 7. E(m,), 


ow ps k pint me; ’ 


t=1 


= kent, +1... 
T2 = off BB a »Bl, (5.7) 


go , 
a pint: /at, : 






and 









(5.8) 
Finally 


E(Mv) ~ B > bife% ~ pert. (5.9) 


For various partitions of f, the values of E(R), E(M,) and E(My) could be 
computed and compared. This would not be an attractive task but it is hard 


to see how a choice could be made between alternative partitions by any other 
method. 








6. CONCLUSIONS AND SUMMARY 


This paper attempts to evaluate the efficacy of the group-screening method. 
The essential idea is to proceed sequentially, testing the factors in groups and 
then testing the groups that appear to contain effective factors. This procedure 
has simplicity, generality and intuitive appeal. However, to examine the per- 
formance characteristics of the method, many assumptions and definitions are 
required. Some of these are essential because there is, in the design of experi- 


388 G. S. WATSON 


ments, a sort of uncertainty principle whereby if the number of runs is decreased, 
the number of assumptions is increased; and conversely. Other assumptions and 
definitions are made for convenience and are more a matter of taste—there 
is no standard formulation of the screening problem to call upon. 

Thus in the detailed formulations, all factors are treated symmetrically and 
the belief that only a few of them will be effective is formalized by introducing 
a prior probability p for each factor independently to be effective. In the simple 
case where the effects, if any, are large compared with the error, it is clear that 
one plan is better than another if it takes less runs. Because the number of runs 
R is a random variable, this is not sufficient and here, somewhat arbitrarily, 
the expected number of runs E(R) has been used as the criterion. Also only 
two stage plans have been considered. Accepting these three facts, the 
analysis of Section 3 and its extensions in Section 5 shows how to arrive at 
group screening plans which are certainly much better than no grouping at all. 
Approximately, the optimum group size k is p™' and the relative testing cost 
is 200/k per cent. 

When the effects are not large compared with error there is no certainty 
that they will be detected and there is also the possibility that non-effective 
factors will be declared effective. The criterion for comparing different plans 
is now not simply E(R) for it is also important to detect all the effective factors. 
It seems that only some cost considerations could provide a suitable criterion. 
The analysis of Section 4 shows how the choice of group size, and of the two 
significance levels a and 6 affects the expectations of the numbers of runs, real 
factors detected and unreal factors declared effective. These results may be 
used, as they are in a worked example in Section 5, to determine whether a group 
plan will perform satisfactorily. Further research and practical experience seem 
to be necessary at this point. A reasonable approach now seems to be: 


(i) determine the optimum design in the absence of error, 
(ii) examine its performance, 


(iii) if not satisfactory, decrease k and/or replicate in the second stage to 
improve the power of the tests. 


The writer wishes to thank W. 8S. Connor for suggesting group-screening plans 
and for many helpful discussions. 


8. References 


Box, G. E. P., ‘Integration of Techniques for Process Control,” Trans. 11th Ann. Conv., 
ASQC, 1958. 

Burman, J. P. anp Piackett, R. L., ‘The design of optimum multifactorial experiments,” 
Biometrika, Vol. 33, (1946), pp. 305-325. 

DorrMan, R., ‘The detection of defective members of large populations,’ Annals of Mathe- 
matical Statistics, Vol. 14, (1943), pp. 436-440. 

FELLER, WILLIAM, An introduction to probability theory and its applications, John Wiley & 
Sons, Inc., New York, 1957. 

Pearson, E. S. anp Hartiey, H. O., Biometrika tables for statisticians, Vol. 1, Cambridge 
University Press, 1954. 

SopeL, M. anp Grout, P. A., ‘“Group-testing to eliminate efficiently all defectives in a 
binomial sample,”’ The Bell System Jcurnal, Vol. 38, (1959), pp. 1179-1252. 

















Vor. 3, No. 3 TECHNOMETRICS Aucusrt, 1961 


Missing Values in Response Surface Designs’ 


NorMaN R. Draper’ 


Mathematics Research Center, United States Army, Madison, Wisconsin 






The estimation of missing values for a general design is described and discussed. 
Formulae are provided for the estimation of missing values for two well-known, 
three factor, second order rotatable designs, with zero to six center points. A worked 
example illustrates the use of the formulae in the case of the cube plus octahedron 
plus one center point design. 




















1. INTRODUCTION 


Suppose we fit by least squares, to the results y’ = (y, , y2, °** , yw) of an 
experimental investigation, a regression equation of the form ¥ = Xb, i.e., the 
model considered is y = XG + e, where e ~ N(0, Io’). Then b = (X’X)7*X’y. 
Usually when an observation is missing, we simply drop out the corresponding 
row in the X matrix. However, in the case of a rotatable design and certain 
other response surface designs, where the X’X matrix and its inverse are already 
known (e.g. designs given by Cochran and Cox [10]), we might prefer to estimate 
the missing value(s) and then proceed with the analysis as originally planned, 
except for adjustments due to the loss of degrees of freedom. We shall now see 
what this involves. 

Suppose that the matrix X’ is divided into [X{ , X4] in such a way that X, 
is associated with yield values y, that are observed, and X, is associated with 
yield values y, that are missing. This is easily effected by rearranging the order 
of the symbols y,; , y2 , -** , Yw So that the f (say) missing values occupy the 
last f places yww-y+1) , *** » yw and re-arranging the rows of X to correspond so 
that the model remains the same. 





2A. Meruop (A) 
Tocher [1] gives the following: 


E(y) = a a pe 
Yo X, 


Thus the estimates from the observed responses would be 
b = (XiX,) 'Xty, 

= (X’X — XX.) 'Xty, 

= (X’X) "(I — X:X,(X’X)*) 'Xty, 

= (X’X) "(1+ X:MX,(X’X)”’)Xiy, 


















‘Sponsored by the United States Army under Contract No. DA-11-022-ORD-2059, 
Mathematics Research Center, United States Army, Madison, Wisconsin. 
* Now at the Department of Statistics, University of Wisconsin. 


389 


390 NORMAN R. DRAPER 


where M = (I — X,(X’X)~*X;)™* and the identity (I + AB) = 
I — A(I + BA)"'B, [1], has been employed. After some algebra, the expected 
values for the missing observations are seen to be 

92 = X.b = MX,(X’X) *X/y, . 
Since 


Xiy, = (Xi | = X’y, say, it follows that §. = MX,b, 
0 


where by is the estimate of § obtained from the data assuming the missing obser- 
vations have zero values. The estimate of § using the data Z = [y, , .]’ is thus 


(X’X)*X’Z = (K’X) '(Kiy, + X22) 

= (X’X)"'(Kty, + X:MX,(X’X) ’Xiy,) 

= (X’X) "(1 + X:MX,(X’X)')Xiy, 

=b. 
It follows that the results of an experiment with missing values can be analyzed 
in the following way. Perform a standard analysis with-all missing observations 
given zero values to obtain by) . Evaluate #7. = MX.b, . Perform the standard 
analysis using 92 for the missing observations. The final coefficients b obtained 


will be the same as those that would have been obtained if (X/{X,)~*X/y, had 
been evaluated 


2B. Mertuop (B) 


We estimate the missing values y. by choosing them in such a way that the 

residual sum of squares is minimized with respect to those values. Now: 
Residual sum of squares S’ = y’y — b’X’y 

= y’y — y’X(X’X) 'X’y 

= y’Hy 
where 

H = I — X(X’X)"'X’ 

(Note that X‘H = X’ — X’X(X’X)"'X’ = 0. This provides a useful check on 
H.) Let H = (h,, h., --- , hy) = (hi, hi, --- , hg)’, since H is symmetric. 
Differentiating S’ partially with respect to y; and setting the result equal to 
zero to satisfy the condition for a minimum we find that h/y = 0, or hiy, + 
hioYo + -++ + hinyw = 0. Since (0/dy;)(h’y) = h,;; > 0, because H is positive 
definite when S’ > 0, this equation does give a minimum. 

Solution of +he equation for y; provides us with the estimate 9; of a single 
missing value y; . If two values y; and y; are missing, we must solve the simul- 
taneous equations h’y = h‘y = 0 for 9; and g; . The obvious extension applies 
for more missing values. 

It is easy to see that this method is equivalent to the one given by Tocher. 
For, since X’ = [X{ , X32], it follows that X’K = X{X, + X/X, and 


H = I — X,(@’X) "xX! | —X,(X’X)'X: | 


—X,(X’X)'X{ iI — X,(%’X) "x: 


Thus if y’ = (y: , Ye , °°: , Yn-s) is the vector of observed values and 

















MISSING VALUES IN RESPONSE SURFACE DESIGNS 391 





, 


= (Yy-sa1 , *** » Yw) is the vector of missing values, we obtain the estimates 
» from 


<> “4 


—X,(X’X)'X/y, + (I — X,(X’K)"X!)f. = 0 
which implies that 
f. = MX,(X’X) 'Xty 


as before. Thus the two methods (A) and (B) are equivalent and lead to the 
same estimates for 2 and b. 


3. ANALYSIS OF VARIANCE 


The correct analysis of variance table is 



















Source S.S. d. f. M.S. 





Coefficients b 





b’X, Vi 4 















Residual by difference N -f-j 3 













Total yi Ji N -f 


































in which only the observations y, appear and where j is the number of co- 
efficients estimated. The correct variance-covariance matrix of the b coefficients 
is, [1], estimated by s” times 


(X{X,)~* = (X/K)"' + (X’XK)"X{MX,(X’X)"', 
as can easily be verified. 


4. DISCUSSION 


In general it would be preferable to use method (A), especially for a design 
used only once, since this would involve the evaluation of only that portion of 
the matrix H = I — X(X’X)~‘X’ which was required, and thus considerable 
calculation would be avoided. Note particularly that M is an f by f matrix 
and so in the case of one or two missing values (i.e., f = 1 or 2), the calculations 
would not be difficult. However when a design is frequently used, it is better 
to evaluate H once and for all and then make use of the appropriate portion 
of it when required. This speeds up the estimation part of the work and also 
reduces the chance of error in the computation. Furthermore, it is particularly 
easy to explain the method to computing assistants. In cases where the statis- 
tician decides either that he is prepared to use (X’X)~‘s” instead of the correct 
(X/X,)~'s’ for the estimated variance-covariance matrix of the regression co- 
efficients, or that he does not wish to examine the standard errors of individual 
coefficients at all, only the missing values values are required and, when H 
is available, this involves very little work indeed. (The effect on the standard 
errors of using the incorrect variance-covariance matrix is illustrated in a worked 
example later in the text.) In Section 6, we shall give the matrix H for the follow- 
ing well-known [2, 3, 4, 5, 6, 10], three-factor, second order rotatable designs: 


(1) cube plus doubled octahedron plus n center points, n < 6. 


(2) cube plus octahedron plus n center points, n < 6. 


392 NORMAN R. DRAPER 


Because these designs have some levels which are neither integral nor rational, 
some or all of the numbers which occur in their H matrices cannot be expressed 
in rational form (see Tables 1 and 2). When designs with integral or rational 
levels are considered, however, this difficulty does not arise, and the estimation 
of missing values for such designs is extremely simple, once H is available. 
These remarks would apply, for example, to the three factor, three level re- 
sponse surface designs discussed by De Baun [7] and the three level designs of 
Box and Behnken [8]. Appropriate calculations for five designs given by De 
Baun appear in an earlier version of this paper [9]. 

5. NoTaTION 


To the results of a group of experiments on three coded factors x, , 22 , X3 , it 
is desired to fit the second order model 


E(y) = Bo + Bit, + Bote + Bst3 + Butt + Basts 
+ Bass + Br2%,22 + BosteX3 + Bsit3X, . 
Thus each row of X consists of the values of 
L, Hay Sey Bey Bi» Bey Be» Bike, Bike» Br; 


at one point of the experimental design and the vector of coefficients to be 
estimated is 


B’ = (60 , Bi , Be , Bs » Bir , Boo , Bas , Biz y Bas , Bai). 
The two designs mentioned above both have X’X of the same form, namely 
Beaein © 6 6 ¢C €@ eC 8 8 611 
Cc’ Oo 
c’ 


Symmetric L321 


where the design consists of the N points (x1, , Zou , Zu), U = 1, 2, --- , N, and 
where 


C=C’ = 7 Wes D= 7 et ; B=B = > zi.2?. , t, j=1,2 or 3; 


i ~ j. Also, since the designs are rotatable, D = 3B. (The column to the right 
of X’X indicates the order in which the rows and columns of X’X have been 
arranged and thus the order in which the estimated coefficients of the second 
order model will emerge.) The inverse (X’X)~* is of the same form, with 





MISSING VALUES IN RESPONSE SURFACE DESIGNS 


P = 10B°A in place of N, 

= —2BCA in place of C; 
R = {4NB — 2C°}A in place of D, 
S = (C? — NB)A in place of B, 
1/B’ in place of B’, 
1/C’ in place of C’, 

where 1/A = 2B(5NB — 3C’), see [2]. 
6. Tae Misstnc VALUE ForMULAE FoR THE CHOSEN DESIGNS 
6.1. Cube plus doubled octahedron plus n center points, rotatable. 


a ae 
Lo 2, 2 Ly Li Lz Ty UiXe LyLy Xsti 


X=/1 -1 > ££ 4 1 
1 '-* 2 =] 
1 1 1 


g 


— a — es — ee —  — 2 — ee — a —] 


a — 








394 NORMAN R. DRAPER 
where a = ~/2. The matric H is size (20 + n) by (20 + n) and has the form 
12345678 91011 12 13 14 15 16 17 18 19 20 21 (20+n) 
H=|prrerssqittuuttuuttuu v 
OS r-e se ee ee Ee Ee Se oe 
prsqrsit tt u uu PSR v 
pqssr ut u 
prrs t 


psr 


s 
3 


3s 


CG: eS eo R= 
es gg 2 S&S 


°° < 


y 
y 
y 
y 
Y 
y 
y 
y 
w 


es eg eg & 
ge ec 








| Symmetric 


The numerical values of the symbols in the body of the matrix H can be 
found by referring to Table 1, using the column with the appropriate number 
of center points n indicated at the head of the column. We also note that 


N = 200+ 2, P = 5/(4 + 5n), 

C = 16, Q = —2/(4 + 5n), 

D = 2, R = (4+ n)/4(4 + 5n), 
B = 8, S = (12 — n)/16(4 + 5n). 





MISSING VALUES IN RESPONSE SURFACE DESIGNS 


TABLE 1 
Cube plus doubled octahedron plus n center points, rotatable 


Eb 
fot [nee nl ne [tad 


Note: Each entry in a column must be divided by the divisor of that column to give the 
correct value. However this is necessary only when obtaining M for the analysis of 


variance. In the estimation equations the divisor is common and cancels. 
* does not occur. 


6.2. Cube plus octahedron plus n center points, rotatable. 


For X, refer to Section 6.1,'set a = 1.681793, a? = 2+/2, and delete the 
tenth, twelfth, fourteenth, sixteenth, eighteenth and twentieth rows of the X 
matrix given there; alternatively, see [10], noting that our 2,2; and x,7; columns 
are interchanged. The matrix H is size (14 + n) by (14 + n) and has the same 
form as the H matrix in Section 6.1 but with the tenth, twelfth, fourteenth, 
sixteenth, eighteenth and twentieth rows and columns deleted. 

The numerical values of the symbols in the body of the matrix H must now 
be taken from Table 2, using the column with the appropriate number of center 
points n indicated at the head of the column. We also note that 


144+, P = 5/(34 — 24/2 + 5n), 

8 + 4/2 = 13.656856, Q = —(4 + 2V/2)/4(84 — 24/2 4+ 5n), 

R = (8 — 4V2 + n)/4(34 — 24-2 + 5n), 
S = (8V2 — 2 — n)/16(34 — 24/2 + Sn). 





NORMAN R. DRAPER 


TABLE 2 
Cube plus octahedron plus n center points, rotatable 


ae . 280331]-0, 230913)|-0. 230624| -0. 230526) -0. 230478) -0, 230448/- 


| 0. 073223 |-0. 023805)-0. 023516) -0. 023418) -0. 023370) -0, 023340)- 


a 0.073223 | 0, 122641] 0. 122930 0.123028) 0.123076) 0.123106] 0.123125 
t f-o. 123146)-0. 193034/-0. 193443) -0. 193580/-0, 193649)-0. 193691|- 
5 0. 123146] 0.053258) 0.052849] 0.052712) 0.052643] 0.052601) 0.052574 


0.023982} 0.012061) 0.008056) 0.006048) 0.004842) 0.004036 


[cama earn eared «| east oat) «smd 
ef. sm a 
DL [ere ori ecus coma eons oon 
p+ Fo-smal-oered- ssn zedl. irae 
| -__feswonel aro esx ses cones anos 
| hse acer] sx ele oan 


* does not occur. 


It will be noticed in Table 2 that, as n increases from 1 to 6, the values of 
P, 9, 7, 8, t, u, w, x and y change very little, while v, y’, z and z’ change very 
nearly in proportion to 1/n. Thus the estimates of any missing values obtained 
by using the columns for n = 2, 3, 4, 5 or 6 would differ very little from estimates 
obtained by first finding the mean yield from the n center points when n > 1, and 
then using this mean as a single center point observation with the n = 1 column. 
The estimates of missing values in other designs are more sensitive to additional 
center point observations, however. 


7. ILLUSTRATIVE EXAMPLE 


(This example is constructed. The data were obtained by rounding off selected 
observations in a numerical example of response surface analysis given by 
Cochran and Cox [10, Table 8A.8].) 

The cube plus octahedron plus one center point design was used for a group 
of experiments to the results of which it was intended to fit a second order 
response surface of the form shown in Section 5. The results y, in the order 
defined by the order of the rows of matrix X in section 6.2, were 


= (16, c, 16, 7, 15, 8, 20, 5; d, 0, 25, 18, 7, 12; 24). 


The letters c and d represent missing yields which are to be estimated. Missing 
are the second and ninth results. Using the notation of section 2B and referring 


Usir 
the 
diag 
(X7 
the» 
to tl 
in th 
is us 
TI 
to tl 
less 
Stanc 
State 
stanc 
Whil 
will 
clusic 









MISSING VALUES IN RESPONSE SURFACE DESIGNS 

to the matrix H of section 6.2 (with n = 1), we see that 
hi = (r, p, 8,7, 8, 7, q, 8; U, t, t, u, t, us v), 
hi = (t,u, t,u, t,u, t,u;w,2,y,y, ¥, ¥32), 


where the values of the symbols are as given in the nm = 1 column of Table 2. 
The estimation equations h{y = hiy = 0 are thus 


pé + ud = —20q — 31r — 36s — 32¢ — 30u — 24v = 4.944919, 
ué + wd = —62y — 67t — 20u — 242 = 6.554246. 
Now 
i ? or a | 3.100700 —0.421558 
—0.421558 2.610095 


so that the required estimates are é = 12.570, d = 15.023. 

From the formulae given in section 6.2, P = 0.988364, Q = —0.337448 
R = 0.165212 and S = 0.102712. The estimates of the coefficients of the 
second order response surface are found in the usual way and are shown in 
Table 3. The analysis of variance table, calculated as shown in section 3, follows. 


UW 


Sum of 
Source Squares d. f. Mean Square 





b 2971.15 


Residual 21.85 


Total 2993 .00 13 





Using the value of s’ just obtained, we can find two different sets of values for 
the standard errors of the estimated coefficients b, incorrectly, by using the 
diagonal terms of (X’X)~'s’, or correctly, by using the diagonal terms of 
(X/X,)~*s as given in section 3. In the latter case it is not necessary to calculate 
the whole of the matrix, only the ten diagonal terms. Table 3 shows, in addition 
to the values of the b coefficients, the standard errors which would be obtained 
in the two cases. Columns labelled (1) are the values obtained when (X/X,)~*s? 
is used; columns labelled (2) are values obtained when (X’X)~'s’ is used. 

The final column of table 3 shows the ratio of the incorrect standard error (2) 
to the correct standard error (1) for each coefficient. This ratio will always be 
less than unity, since use of the incorrect variance-covariance matrix will give 
standard errors that are smaller than the actual ones. It follows that confidence 
statements for the values of the regression coefficients based on the incorrect 
standard errors will be attributed a higher probability than is actually the case. 
While care must be exercised, use of the incorrect variance-covariance matrix 


will not, in most practical situations, appreciably affect the statistician’s con- 
clusions. 

























































































































NORMAN R. DRAPER 


TABLE 3 
The b coefficients and their standard errors 


Se oa 

Tro | nee [ome 
Pie | aa] or [0] 
Faas [oan 30 | 0 
Fai [a in| 0m [| 
ria [aaa os | a 


Pie [sea 0 


If it is not desired to examine the standard errors of the individual coefficients 
then, after the estimation of the missing yields, the analysis can be continued 
in the way indicated by Cochran and Cox [10]. 


REFERENCES 


{1] Tocher, K. D., (1952), “The design and analysis of block experiments,” Journal of the 
Royal Statistical Society, Series B, Vol. 14, pp. 45-100. 

(2] Box, G. E. P. and Hunter, J. 8., (1957), ‘““Multi-factor experimental designs,”’ Annals of 
Mathematical Statistics, Vol. 28, pp. 195-241. 

(3] De Baun, Robert M., (1956), “‘Block effects in the determination of optimum condi- 
tions,’’ Biometrics, Vol. 12, pp. 20-22. 

{4] Gardiner, D. A., Grandage, A. H. E., and Hader, R. J., (1959), ‘“‘Third order rotatable 
designs for exploring response surfaces,’ Annals of Mathematical Statistics, Vol. 30, 
pp. 1082-1096. 

[5] Bose, R. C. and Draper, Norman R., (1959), ‘Second order rotatable designs in three 
dimensions,’ Annals of Mathematical Statistics, Vol. 30, pp. 1097-1112. 

(6) Dykstra, O., (1960), ‘“‘Partial duplication of response surface designs,’’ T'echnometrics, 
Vol. 2, pp. 185-195. 

[7] De Baun, Robert M., (1959), ‘“Response surface designs for three factors at three levels,” 
Technometrics, Vol. 1, pp. 1-8, corrections p. 419. 

(8] Box, G. E. P. and Behnken, D. W., (1960), ‘Some new three level designs for the study 
of quantitative variables,” Technometrics, Vol. 2, pp. 455-475. 

[9] Draper, Norman R., ‘‘Missing value formulae for certain three factor, second order 
response surface designs,’’ Report No. 201, Mathematics Research Center, U. S. Army, 
Madison, Wisconsin. 

[10] Cochran, W. G. and Cox, G. M., Experimental Designs, second edition (1957), John 
Wiley and Sons. 








Vor. 3, No. 3 TECHNOMETRICS Aucust, 1961 












The Optimum Allocation of Spare 


Components in Systems’ 


Donatp F. Morrison? 


National Institute of Mental Health, 
Bethesda, Md. 

















This paper considers a system whose components may be divided into two sub- 
systems, each containing components whose lives are exponentially distributed but 
with different scale parameters. Each system can be assigned a store of replacement 
spares for failed components. Charts are presented for allocating a fixed number of 
spare components between the two sub systems to provide maximum reliability over 
some specified interval of time. Expected life is also taken as a maximand, and a table 
is given of combinations of spares of each type so as to maximize system life. The 
effect of non-exponential component densities upon these optimum allocations of 
components is also discussed. 


1. INTRODUCTION 



























Let us consider some complex system whose components may be grouped 
into subsystems according to the form of distribution function specifying their 
lengths of life. Failure of any component will cause the entire system to fail. 
Under the requirement that components of one type of life distribution are 
not interchangeable with those of another, each subsystem is assigned a store of 
similar spare components. System failures are corrected by successively re- 
placing failed components in the subsystems from the appropriate stores of 
spares until a failure occurs with no spare available for substitution. At this 
“final failure” the entire system is discarded. 

Examples of such systems with spare components are found in the ‘fly away” 
spare parts kits of military logistics, in the four original tires and single spare 
of an automobile, and in the sale of nylon hose in quantities greater than pairs. 
In each of these examples the environment is assumed to be such that no re- 
placements are available beyond those originally assigned to the store of spares, 
so that in general the results of renewal processes with an infinite number of 
renewals are not applicable. 

Cox [1] has treated the statistical properties of systems supplied with spare 
elements in the particular case of a single component type. Morrison and David 
[2, 3, 4] have also investigated the identical-component situation from a differ- 
ent approach, and have obtained expressions for the distribution of system life 
for small systems and a general number of spares, and conversely for n com- 
ponents and a small set of spares. Those expressions have been evaluated for 





1 Work supported, in part, by the Office of Ordnance Research, U. S. Army, Contract 
No. DA-36-034-ORD-1527 RD. 


2 On training leave at Virginia Polytechnic Institute, 1959-1960. 
399 


400 DONALD F. MORRISON 


certain gamma component densities, and tables of expected system life and 
charts of the associated distribution functions have been presented in [2]. 

Proschan and Black [5, 6] have considered the problem of assigning spares 
in some optimal fashion to several subsystems of dissimilar components that 
constitute a larger system. They allocate some total number of available spares 
in such a way that system reliability, or the probability that final system failure 
will not occur prior to some time 2, is at a maximum. The number of available 
spares is constrained by some linear budgetary inequality. An iterative scheme 
for computing the set with greatest reliability is given for components whose 
lives have distributions of the Polya type. 

It is the purpose of this discussion to give explicit solutions to a less general 
version of that programming problem for two subsystems and small numbers 
of available spares, under the assumption that the component lives are ex- 
ponentially distributed, although wit different scale parameters, in each sub- 
system. Charts are presented for allocating a fixed number of spares between 
the two subsystems for maximum reliability on some specified interval of time. 
Expected system life is also taken as a maximand, and a table is given of combi- 
nations of spares of each type that maximize that quantity. Some attention is 
devoted to the effect of non-exponential component densities upon these opti- 
mum strategies. 


2. THE DIsTRIBUTION OF SystTEeM LIFE 


In general, let the ith of the m subsystems in the aggregate system have n; 
identical components and a store of k; similar spares. These elements have lives 
distributed according to the continuous density function f (x). Complete 
stochastic independence is assumed among the lives of the 55", n; + >>”, k; 
elements. The life of the 7th subsystem, that is, the time of the (k; + 1)th failure 
in that subsystem, will be written L(n; , k;). From the autonomy of each sub- 
system, the life of the complete system is 


LQ +++ »% 5h, +++, k,) = min Lin, , k,). (2.1) 


If the reliability of the 7th subsystem is denoted by 
R(x; n; ,k;) = PLM , k,) > 2), 
then system reliability is the product of these probabilities, or 
R(x; my, °° ym phy °°? y km) = [] R(x; n; , k,). (2.3) 
t=1 


Since system reliability is merely the complement of the cumulative distribu- 
tion function of system life, the density function of that variate follows from 
differentiation and a change of sign in (2.3) to be 


pL) = Sp) TL RG, ky), (2.4) 
_ jae 


where L = L(-), and p“(L) is the density function of the ith subsystem life. 
Since the mean of a continuous positive random variable with distribution 











OPTIMUM ALLOCATION OF SPARE COMPONENTS 401 


function F(x) can be computed as Ex = {> (1 — F(zx)) dz, it will be convenient 
to write expected system life as 













i os ae oe. kere [ II Ree; n, , k,) ae. (2.5) 
0 


t=1 
The principal results of this paper are restricted to components with the 
exponential life density 
f(z) = a; exp(—az), O<2,a; < @, (2.6) 


in the ith subsystem. From the random nature of failures characterized by the 
exponential life density, it is known [2] that the life = L(n, , k,) of the ith sub- 
system has the gamma density 


p(L) = (k; )"(ne,)"""L™ exp (—n,a;L), (2.7) 
and reliability 
ki 
R(x; n; , k,) = exp (—na,z) D> (na;x)"/h! . (2.8) 


Thus, 
R(x; 1: ads 













ste t & ae es 


» Km) 


= exp (—2 > na) Il | (n;a;x)"'/h; |. (2.9) 


i=1 


3. THE ALLOCATION.OF SPARES BETWEEN Two SUBSYSTEMS 
FoR Maximum Expectep System LIFE 








The integration of the reliability function for m = 2 and component life 
density (2.6) yields the expected system life 


EL(ny, , ne ; ky , ke) 


ay 3 : ; . . 
= (na, + naz) >) p> al — AG + HYG), (8.1) 
1=0 j= 
where @ = na;/(ma, + Na). The k, , k, that maximize expected system 
life under the constraint k, + k, = K are determined by solving inequalities 
in 6 of the form 











ELM » Neo > ky ’ | k,) = EL(@, » Ne > ky > 1, i= ky . 1), (3.2) 


1=1,---,K — k,. Table 1 gives values of k, , k, for K = 2(1)6. Figure 1 
shows the maximum expectations for K = 6 in units of (nia; + na2)™*. 

As an example of the utility of Table 1, consider the system composed of 
subsystems of n, = 2, n. = 5 components with scale parameters a, = .1, 
a, = .08, respectively. Any six spares are available for inclusion with the system. 
Since @ = 4, the table directs that k, = 2 spares be of type 1, while the remaining 
four are of type 2. The expected life for such an allocation may be determined 
from Figure 1 to be about 9.7 units. 





DONALD F. MORRISON 


TABLE 1 


The Allocation of Spares for Maximum Expected Total Life 
EL(n, , n2 ; ki , k2) in a System With Two Component 
Types: n; , n2 Components, k; , k2 Spares 


f(x) = a exp (—ax), f(z) = a2 exp (—azr) 


6 = nyai/(nia1 + N22) 


Allocation Allocatio 


%<06< 4 of Spares >< 6< 4 of Spares 


9 

0 
.232 
.768 


0 
.131 
.500 
. 869 


0 
.084 
347 
653 
.916 


A: 
.232 
. 768 
1 


131 
.500 
. 869 
1 


.084 
347 
.653 


Total 
Spares Th 
5 0 

.058 
.259 
-500 
741 
.942 


0 
042 
.198 


-606 
.802 
.958 


1 


058 


.500 


4. THe ALLOCATION OF SPARES BETWEEN Two SUBSYSTEMS FOR 
Maximum System RELIABILITY ON AN INTERVAL OF TIME 


It is often of foremost importance that the system should operate satisfac- 
torily for some specified length of time. This requirement would be appropriate 


(nox,* nx.) EL( +) 
7.0 rt 


= no / (nx, : WwW 


Figure 1—Maximum expected life EL(n: , nz ; k, 6 — k) of a system with two component 
types and six available spares. 





OPTIMUM ALLOCATION OF SPARE COMPONENTS 


fay 
5 


Figure 2—Regions specifying the allocation of k, and kz spares for maximum reliability, 
ki + ke = 2. 


for a missile flight of known duration, or for military communication equip- 
ment that must last throughout a field operation of fixed length [5]. Solutions to 
the equal component-cost version of Proschan and Black’s programming prob- 
lem for two subsystems have been determined by solving inequalities akin to 
(3.2) in the system reliabilities (2.9). Figures 2-6 present strategies of spare 
allocation that maximize the system reliabilities for intervals of length z, sub- 
system sizes 2, , nm. , component life densities f(z) = a; exp (—a,x), and 
available spares K = 2(1)6. 


Ficure 3—Regions specifying the allocation of k, and kz spares for maximum reliability, 
ki + ke = 3. 





DONALD F. MORRISON 


Fiaure 4—Regions specifying the allocation of k,; and kz spares for maximum reliability, 
ky a ke = 4, 


As an example of the application of the charts, consider a system composed 
of two subsystems of two and five components, with exponential component 
lives with scale parameters .03 and .0625, respectively. Four spares are avail- 
able. It is desired that system reliability be at a maximum on the interval (0, 2). 
Figure 4 specifies that the maximum reliability will be attained for one spare 


assigned to the first subsystem and the remaining three assigned to the second. 
The actual reliability may be computed from (2.9) to be about .989. 


no, x 
Tks0, KS W'k,*5, KO 


Figure 5—Regions specifying the allocation of k; and kz spares for maximum reliability, 
ky + ke = 5. 





OPTIMUM ALLOCATION OF SPARE COMPONENTS 
m%* 
5 


nax 


Ik,0,K26 W'k,*6, 4,20 
Figure 6—Regions specifying the allocation of k; and kz spares for maximum reliability, 
ky + ke = 6. 
5. AN ALLOCATION RULE FoR A NoN-EXPONENTIAL 
CoMPONENT LIFE DENSITY 


It will be of interest to compare the previous optimum allocation rules for 
maximum expected system life with those for component life a 4x{ variate, or 


f(z) = aiz exp(—a,z), O<2,a;< ©, ti=1,2. (5.1) 


The reliability of a system with a single component is known from the proper- 
ties of the gamma density to be 


2k+1 


R(x; 1, k) = exp (—a;z) dX (a;x)'/j! . (5.2) 
It has been shown [2] that 
R(x; 2, k) — R(x; 2,k — 1) = } exp (—2a;x)(Qa;x)"*[(2k) ]" 
+ exp (—2a;x)(2a,x)"**"[(2k + 1)!" 


+ 4 exp (—2a,x)(2a,x)"**?[(2k + 2)", (5.8) 
so that 


2k+1 


R(x; 2, k) = exp (—2a;,2) = (Qa;x)'/j! 


+ 4 exp (—2a,x)(2a,;x)"**?/(2k + 2)!. (5.4) 


For two subsystems and two available spares, inequalities of the form (3.2) 
have been solved in the parameter 6 = n,a,/(n,a, + Nea) for the three different 
combinations of one- and two-component subsystems. The rather lengthy poly- 
nomials in @ will be omitted. The values of k, , k, that maximize the mean life 
are shown in Table 2. In each of the three cases the range in 6 for which an 





DONALD F. MORRISON 


TABLE 2 
The Allocation of Two Spares for Maximum Expected Life 
EL(ni , n2 ; ki , k2) in a System with Two Component 
Types: n; , n2 Components, k, , k2 Spares 
f(x) = aw exp (—aiz), f(z) = aor exp (—aer) 
@ = niai/(niai + Neae) 


%<06< A 
Allocation 
mi,ne2 =1 nm, = 2,n2=1 ni,Nn2 =2 of Spares 
60 6, 60 6, 60 6; ky ke 
0 . 282 0 .344 0 .309 0 3: 
-282 .718 .344 .750 .309 .691 1 1 
was 1 750 (1 A) i | 2 0 


equal split of the two spares is implied is narrower than that for the exponential 
component life in Table 1. 

I am indebted to Dr. H. A. David, Virginia Polytechnic Institute, for sug- 
gesting the problem of the statistical properties of systems with spare com- 
ponents, and for many helpful discussions in the course of the investigation. 
This particular case of several distinct subsystems was proposed by Mr. Nathan 
Mantel, National Cancer Institute. Mr. Walter Johnston prepared the various 
charts. 


REFERENCES 

. Cox D. R., “A Renewal Problem with Bulk Ordering of Components,” J. Royal Stat. Soc. 
(B), Vol. 21 (1959), pp. 180-189. 

. Morrison, D. F., anp Davin, H. A., Technical Report No. 45, Life Distribution and Reli- 
ability of a System with Spare Components, Department of Statistics and Statistical Labora- 
tory, Virginia Polytechnic Institute, Blacksburg, Virginia, 1959. 

. Morrison, D. F., Technical Report No. 46, Some Further Results on Systems with Spare 
Components, Department of Statistics and Statistical Laboratory, Virginia Polytechnic 
Institute, Blacksburg, Virginia, 1960. 

. Morrison, D. F., anp Davin, H. A., “The Life Distribution and Reliability of a System 
with Spare Components,” Annals of Mathematical Statistics, Vol. 31, (1960), pp. 1084-1094. 

. Buack, G., AND Proscuan, F., “On Optimal Redundancy,” Operations Research, Vol. 
7 (1959), pp. 581-588. 

. Proscuan, F., Polya Type Distributions in Renewal Theory, with an Application to an inven- 
tory Problem, Englewood Cliffs, N. J., Prentice-Hall, Inc., 1960. 





TECHNOMETRICS Avucust, 1961 


Use of Tables of Percentage Points of Range and 
Studentized Range 


H. Leon Harter 


Aeronautical Research Laboratory 
Wright-Patterson Air Force Base 


A description is given of new tables, more extensive and more accurate than any 
previously published, of the percentage points of the range and of the studentized 
range, for samples from a normal population. The purpose of this paper is to call 
attention to these tables and to illustrate their use. The following examples of the 
use of the tables are given: (1) an application of the percentage points of the range 
to tests of hypotheses concerning the standard deviation of a normal population; 
(2) an application of the percentage points of the range to rejection of outliers; and (3) 
an application of the percentage points of the studentized range to multiple com- 
parisons tests on means of samples from a normal population. 


1. DESCRIPTION OF THE TABLES 


1.1 Percentage Points of the Range. Based upon a new table of the probability 
integral P(W, n) of the range W for samples of size n from N(u, 1) [the normal 
population with mean y and variance one], Harter and Clemm [8] have computed 
a new table of the percentage points of the range [values of the range W corre- 
sponding to cumulative probability P for samples of size n from N(u, 1)]. Re- 
sults are given to six decimal places, accurate to within a unit in the last place, 
for P = .0001, .0005, .001, .005, .01, .025, .05, .1(.1).9, .95, .975, .99, .995, .999, 
.9995, .9999 and n = 2(1)20(2) 40(10)100. This table is also included in a paper 
by Harter [7]. 

1.2 Percentage Points of the Studentized Range. Harter, Clemm, and Guthrie 
[9] have computed a new table of the percentage points of the studentized range 
[values of the studentized range Q = w/s corresponding to cumulative prob- 
ability P for samples of size n from a normal population, with v degrees of free- 
dom for the independent estimate s’ of the population variance]. This table is 
based upon a new table of the probability integral P(Q, v, n) of the studentized 
range Q = w/s for samples of size n from a normal population with v degrees 
of freedom for the independent estimate s’ of the population variance. The 
percentage points are given to four significant figures or four decimal places 
(whichever is less accurate), accurate to within a unit in the last place given, 
for P = .001, .005, .01, .025, .05, .1(.1).9, .95, .975, .99, .995, .999; 
nm = 2(1)20(2)40(10)100; and » = 1(1)20, 24, 30, 40, 60, 120, ©. An abridg- 
ment of this table, for P = .9, .95, .975, .99, .995, .999 and all of the above 
values of n and », is included in the aforementioned paper by Harter [7]. 

Harter, Clemm, and Guthrie [9] have also computed a new table of critical 
values for Duncan’s new multiple range test [values of the studentized range 
( = w/s corresponding to protection level P = (1 — a)” (significance level a) 


407 





408 H, LEON HARTER 


for testing p successive values out of an ordered arrangement of m means of 
samples from a normal population, with v degrees of freedom for the independent 
estimate s° of the population variance]. This table of special percentage points 
of the studentized range has the same accuracy as the table of regular percent- 
age points, and is based upon the same table of the probability integral. Critical 
values of Q are given for P = (1 — a)”, with a = .10, .05, .01, .005, .001: 
p = 2(1)20(2)40(10)100; and v = 1(1)20, 24, 30, 40, 60, 120, ~. This table is 
also included in a paper by Harter [6]. Special attention is called to this table 
because it corrects some rather sizeable inaccuracies in previously published 
tables (see [3]). 


2. EXAMPLES OF THE USE OF THE TABLES 


2.1 Tests of Hypotheses Concerning the Standard Deviation of a Normal 
Population. Suppose one wishes to test the hypothesis H, : ¢ = 1 against the 
class of alternatives specifying o > 1 for a sample of size n from a population 
which is known or assumed to be normal, using a test having Type I error rate 
a. If one takes n = 10 and a = .10, one finds ({7], Table 1 or [8], Table 1.2) 
that the critical region for the test based on the sample range w is w > 4.129. 
Consider now the following two samples: 

Sample (1): —2.015, —0.623, —0.699, 0.481, —0.586, —0.579, —0.120, 

0.191, 0.071, —3.001; 

Sample (2): —0.172, 0.655, 3.345, —1.677, 1.030, —0.912, —0.004, 0.560, 

— 1.568, 0.303. 
The ranges of these two samples are respectively w, = 3.482 and w, = 5.022; 
hence one accepts H, for Sample (1) and rejects Hy, for Sample (2). The con- 
clusion is correct in each case, since both samples were drawn from tables of 
random normal numbers given by Dixon and Massey ((2], pp. 295-304 and 
355-359), Sample (1) from a table for 1 = 0, ¢ = 1 and Sample (2) from a table 
for » = 0, 0 = 2. The test based on the range is less powerful than the test 
based on the sample variance s’, which is performed by comparing the ratio 
ns’/o” with a critical value read from a table of percentage points of the 
x’ distribution. The range test is, however, much simpler to apply, and often 
one chooses the simpler of two tests rather than the more powerful one, es- 
pecially when the difference in power is not great. 

2.2 Rejection of Outliers. Suppose one wishes to know whether a sample 
from a population, assumed to be normal with known standard deviation ¢, 
has been contaminated by the inclusion of one or more values from a normal 
population with a different mean or a different standard deviation. Dixon [1] 
has studied the power and other properties of various tests for the rejection of 
outliers, and has recommended the use of the standardized range W = w/o as a 
test statistic. As an example, consider the following sample, purported to have 
come from a normal population with o = 1, e.g., 0.508, 0.946, —0.007, — 1.324, 
—0.510, 0.688, —0.834, 1.158, 0.309, —0.729, —1.222, 2.017, —2.172, 0.623, 
2.698. In cases of this kind, one may wish to test the null hypothesis that all 
values actually came from the same normal population with « = 1 against the 
alternative hypothesis that the value differing most from the sample mean, 
in this case the last value (2.698), came from a population with a different mean 





USE OF TABLES OF PERCENTAGE POINTS 409 


or a larger standard deviation. To do this, one may compare the standardized 
range W = w/o = [2.698 — (—2.172)]/1 = 4.870 with the 95% point for the 
distribution of the range of samples of size 15 from N(u, 1). The latter is found 
({7], Table 1 or [8], Table 1.2) to be 4.796; hence the null hypothesis is rejected, 
and one concludes that the outlier (2.698) came from a different population. 
The conclusion is correct in this case, since the first 14 values in the sample 
came from a table of random normal numbers for u» = 0, ¢ = 1 and the last 
one from such a table for » = 2, ¢ = 1, both tables given by Dixon and Massey 
((2], pp. 295-304 and 350-354). 

2.3 Multiple Comparisons Tests on Means of Samples from a Normal Popu- 
lation. Suppose one has performed an analysis of variance and has found that 
the means of m groups, each of size k, differ significantly overall. Let s’ be 
the within-groups variance, based upon v = m(k — 1) degrees of freedom. 
Now one wishes to know which group means differ significantly from each other. 
Various tests have been proposed, including several based on the range. Two of 
these, the Newman-Keuls test and Duncan’s new multiple range test, will be 
illustrated here. If either of these is to be used, one computes the standard 
error of the mean, s; = 4/s’/k. He also arranges the group means in order from 
the smallest, Z, , to the largest, Z,, , and determines their range, Z,, — £, . He 
then compares the range with the critical range obtained by multiplying s, 
by the significant studentized range (critical value) for sample size m and degrees 
of freedom » for the test under consideration. If the range exceeds the critical 
range, he next compares Z,, — £, and Z,,_, — #, with the critical range obtained 
by multiplying s, by the significant studentized range for sample size (m — 1) 
and degrees of freedom v for the test under consideration, and so on until no 
further significant ranges are found. Whenever the range of any group is found 
to be non-significant, one concludes that the entire group has come from a 
homogeneous source, and no test is made on the range of any subgroup of that 
group. 

Now consider the analysis of data obtained from an experiment performed 
in the Aero Medical Laboratory of Wright Air Development Center to test the 
effect of diet on total water intake (grams) of laboratory rats during a 4-day 
period. Five diets were studied: no food and 8 calories per rat per day of each 
of four foods. One hundred rats were divided randomly into five groups of 20 
rats each, one group on each diet, and the total water intake in grams of each 
rat for a 4-day period was determined. An analysis of variance applied to the 
data gave the following results: 


Source of Variation d.f. SS. MSS. F 


Among diets 4 34761 8690 
Within diets (error) 95 144926 1526 


Total 99 179687 
**Significant at the 1% level 
The mean water intake for the various diets was as follows: 


Glucose No Food Olive Oil Lab Chow Casein 
80.25 99.95 107 .35 121.50 134.90 





410 H. LEON HARTER 


Having found that the overall effect of diets is significant, the experimenter 
next wanted to know which diet means differ significantly from each other. 
This question can be answered by the use of one of the multiple comparisons 
procedures, several of which are based on the studentized range. The first step is 
the computation of the standard error of the mean, s, = (1526/20) = (76.30)! = 
8.735. Next one multiplies s, by the significant studentized range (critical 
value), for p means (p = 5, 4, 3, 2) with 95 degrees of freedom for s’, for the 
test under consideration. The significant studentized ranges for the Newman- 
Keuls test can be read from [7], Table 3 or from [9], Table II.2 and those for 
Duncan’s test from [6], Table 1 or from [9], Table II.3, where P = 1 — a for 
the Newman-Keuls test and P = (1 — a)””' for Duncan’s test. For purposes 
of illustration, consider 5% and 1% level tests (a = .05, .01). Since there are no 
values in the table for »y = 95, it is necessary to apply linear harmonic inter- 
polation between the tabular values for vy = 60 and »v = 120. Since 1/95 is 0.7369 
of the way from 1/60 to 1/120, the values for » = 95 are 0.7369 of the way from 
those for vy = 60 to those for vy = 120. One finds the following values for the 
significant studentized range of p means at significance level a with 95 degrees 
of freedom for s”: 


Newman-Keuls test Duncan’s test 
a= .05 a= .01 a= .05 a= .0l 


3.933 4.738 3.123 4.062 
3.699 4.523 3.052 3.982 
3.367 4.222 2.955 3.875 
2.808 3.718 2.808 3.718 


By multiplying each of the above significant studentized ranges by the standard 
error of the mean, s, = 8.735, one finds the following values for the significant 
range of p means at significance level a with 95 degrees of freedom for s”: 


Newman-Keuls test Duncan’s test 


a = .05 a= .01 a = .05 a= .01 





34.35 41.39 27 .28 35.48 
32.31 39.51 26 .66 34.78 
29.41 36.88 25.81 33.85 
24.53 32.48 24.53 32.48 


The results of the tests are as follows, where any two means not underscored 
by the same line differ significantly at the indicated level: 


Test and Significance Level Glucose No Food Olive Oil Lab Chow Casein 





Newman-Keuls 5% 80.25 99.95 107 .35 121.50 134.90 


Newman-Keuls a 99.95 107 .35 121.50 134.90 


Duncan <4 99.95 121.50 134.90 





Duncan Q% 99.95 121.50 134.90 








USE OF TABLES OF PERCENTAGE POINTS 411 


Keuls [10] and Duncan [3] have given a fuller discussion of these tests. Harter 
[5] has made a study of error rates and sample sizes for multiple comparisons 
tests based on the range, while Federer [4] has given a good textbook discussion 
of these and other multiple comparisons tests. 


REFERENCES 


{1] Drxon, W. J., “Analysis of extreme values”, Annals of Mathematical Statistics, Vol. 
21 (1950), pp. 488-506. 

[2] Dixon, WitFrip J. AND Massey, Frank J., Jr., Introduction to Statistical Analysis, 
McGraw-Hill Book Co., Inc., New York, 1951. 

[3] Duncan, Davin B., “Multiple range and multiple F tests’, Biometrics, Vol. 11 (1955), 
pp. 1-42. 

[4] FepereR, WaLTER T., Experimental Design, The Macmillan Company, New York, 1955, 
pp. 18-45. 

[5] Harter, H. Leon, “Error rates and sample sizes for range tests in multiple comparisons’, 
Biometrics, Vol. 13 (1957), pp. 511-536. 

[6] Harter, H. Leon, “Critical values for Duncan’s new multiple range test’’, Biometrics, 
Vol. 16 (1960), pp. 671-685. 

[7] Harter, H. Leon, “Tables of range and studentized range’, Annals of Mathematical 
Statistics, Vol. 31 (1960), pp. 1122-1147. 

[8]* Harter, H. Leon anp Ciemm, Dona tp §S., “The Probability Integrals of the Range and 
of the Studentized Range—Probability Integral, Percentage Points, and Moments of 
the Range”, Wright Air Development Center Technical Report 58-484, Vol. I, 1959. 
(ASTIA Document No. AD 215024). 

(9]* Harter, H. Leon; Ciemm, Donatp S.; anp GutTuriz, EvuGENE H., “The Probability 
Integrals of the Range and of the Studentized Range—Probability Integral and Per- 
centage Points of the Studentized Range; Critical Values for Duncan’s New Multiple 
Range Test”, Wright Air Development Center Technical Report 58-484, Vol. II, 1959. 
(ASTIA Document No. AD 231733). 

(10) Keuts, M., “The use of the ‘studentized range’ in connection with an analysis of vari- 
ance”, Euphytica, Vol. 1 (1952), pp. 112-122. 


* Available to the public from Office of Technical Services, U.S. Department of Commerce, 
Washington 25, D. C. or to qualified requesters from Armed Services Technical Information 
Agency, Arlington Hall Station, Arlington 12, Virginia. 








Vot. 3, No. 3 TECHNOMETRICS Aucust, 1961 


The Reliability of Components Exhibiting 


Cumulative Damage Effects* 


GreorceE H. Weiss ** 


U.S. Naval Ordnance Leboratory 
White Oak, Maryland 


Environmental conditions do not usually remain fixed over the life history of an 
operating mechanism. This paper discusses several simple reliability functions and 
their moments taking into account possible hereditary effects, random operating 
conditions and other environmental influences. 


FarILturE MopE.Ls oF CurRENT INTEREST 


There have been many studies published on the reliability of mechanisms 
which function in a fixed environment; so many in fact, that the main parameters 
which describe the failure characteristics of such mechanisms may be said to be 
common knowledge among engineers. However, mechanisms do not usually 
function in a fixed environment, and are subject to random usage times, random 
environmental influences, and other factors which tend to introduce complica- 
tions into the description of the failure rates of mechanisms by a single reliability 
function. It is the purpose of this paper to study several simple models which 
incorporate random operating and environmental effects in the derivation of the 
reliability functions and their moments. 

As a possibly more concrete way of motivating this study we may consider 
the hazard, or failure rate of a vacuum tube, which may be idealized in a way 
illustrated in Fig. 1. The initially high failure-rate period, in which there is a 
high ‘infant mortality,” is followed by a period during which the failure rate 
remains reasonably constant, and the reliability function is an exponential. 
Finally, there is a period of wearout failures, when the failures again begin to 
increase. If the times ¢, marking the end of the infant mortality phase, and ¢, 
marking the end of the constant failure rate phase were known, it would be a 
simple matter to set down an expression for the reliability function. It is more 
often the case that the times ¢, and ¢, depend on the history of the vacuum tube, 
i.e., how long it was used during each usage period, and the environmental 
influences which it received. In particular, there has been a good deal of interest 
lately in models for the development of reliability in prototype models. H. K. 
Weiss* has studied the estimation of parameters for a mathematical model in 
which the device under development is subject to a number of flaws which may 
be remedied, one by one, in the design and testing stages. This model can be 


* This paper appeared in a modified form in the 1960 Convention Record of the IRE. 
** Present Address: Institute for Fluid Dynamics and Applied Mathematics, University 
of Maryland, College Park, Md. 


413 





GEORGE H. WEISS 


FAILURE 
RATE 


t— 


Figure 1—Plot of Idealized Failure Rate. 


studied by the same methods as are used in this paper. Another situation of 
some interest from the point of view of the present analysis concerns the relia- 
bility of batteries, whose useful lifetimes are influenced by the shelf time and 
previous operating history. 


SmmpLE MopeEt INVOLVING THE HEREDITARY EFFECTS oF USE 


The first model to be analyzed is one which incorporates in the simplest 
possible way, the hereditary effects of use. Let us then consider a device which 
is called on to function at a set of times ¢; where the ¢; may or may not be random 
variables. The device in question, let us for simplicity call it a relay, can be in 
either one of two states, operable or non-operable, and performs its function 
instantaneously. 

The following parameters will describe the model to be analyzed: 

Q(t — 7): The probability that the relay will be operable at time ¢ conditional 

on its operability at time 7. 

R(t): The reliability or absolute probability of operability at time t. 

Q(t): The probability that two successive times of operation of the relay differ 
by more than ¢ units of time. 

E(t‘): The kth moment of the time to failure, defined by 


E(t) = -[ ff dR(t) = k [ ” FR(D) di. (1) 


The probability that the relay will be called on to operate at a time ¢ after a 
given operation is therefore given by —dQ(t). If Q(t) is differentiable we define 
a function q(t) by 


qt) = —Q’(d) (2) 


and such that q(t) dt is the probability that the relay is called on to operate 
for the first time at an instant T ¢ (t, t + dt) after a given operation. If we allow 
q(t) to consist of delta functions then all of the essential cases will be covered. 
We will make this assumption and henceforth deal only with q(t). 

We can now set down the integral equation relating the reliability function 
R(t) to the known functions Q(t) and q(t). This equation is 


R(t) = Q()Q(4) + [ Q(7)q(7)R(t — 7) dr. (3) 





COMPONENTS EXHIBITING CUMULATIVE DAMAGE EFFECTS 415 


It is valid provided we assume that at time ¢ = 0 the relay was successfully 
operated. In that case we can allow for two possibilities in a derivation of Eq. (3). 
Either the operation at £ = 0 was the only one before time ¢, and the relay is 
still operable. This occurs with probability Q(Q(t). The second possibility is 
that the very first time of operation was 7 and the relay was thereafter operable. 
This yields the second term in Eq. (3). A similar integral equation was derived 
in studying random replacement policies in reference 2. 

Equation (3) can be solved by noting that it is in the convolution form and 
is greatly simplified by introducing Laplace transforms. To simplify notation 
we define two new functions 


Ud) = QA) 
VQ) = g)aQXh) (4) 


in terms of which Eq. (3) can be written 


R(t) = U(t) + [ R(t — 1)V(x) dr. (5) 


Denoting the Laplace transform of a function of time by that same function 
with arguments and an asterisk, e.g., 


F*(s) = [ ” e-™F(t) dt 


_ __U*(s) 
R@ = To V@ 


The moments of the time to failure are given by 
d‘~'R*(s) 

ds*~* iat. 
In particular the first two moments are given by 


ee 
BY) = T+ 

» _ of A = VOU") + UXOV*"O)]., 
Be) = -2] (= VO) 


E(t) = (—1)'"'k 


(9) 


The simplest case for which an explicit solution can be given is for a Poisson 
demand and Poisson failure rate, i.e., 


Q(t) = exp(—pt), Q(t) = exp (—Ad) (10) 


U(t) = exp[—Q + pt] V(t) = pexp[—(Q + p)i]. (11) 
The inversion of Eq. (7) for these functions yields the expression for R(t); 


R(t) = exp (—Né). (12) 





GEORGE H. WEISS 


tm t—» 
Figure 2—Generic Plot of —R’(t) for Equation (18). 


If the distribution of operating times is an exponential as in Eq. (10) U*(s) 
and V*(s) are given by 


U*(s) = O*(s + p) 
V*(s) = pQ*(s + p) 
and the expressions for the first two moments are 
, 2*(p) 
Kb) = ———-- 
O = T= pa") 


—20*'(p) 


E(#’) = a — pi*(y)* (14) 


As an example of a case in which the failure rate is favorably affected by 
frequent use consider the particular choice for Q(t): 


Q(t) = e& (1 + dd) (15) 


with the transform 


s + 2Xr 7 
(s +d)” (16) 


The conditional failure rate or hazard is 


o*(s) = 


—O(t) Wt 


ae tae (17) 


We see that the failure rate is zero immediately after the relay is used and 
thereafter rises to a constant value. This is a reasonable model for a relay which 


has a tendency to “stick” if not frequently used. From Eq. (7) we are led to the 
following expression for R(t) 


R(t) = 35 (p + 2d + 6) exp [| -(e+#2=%,] 


1 2X 6 - 
— 59 fe + 2d — 6) exp| —(2+ + 2),] (18) 


where 6 = (p° + 4Ap)'. Of somewhat greater interest is the function —R’(t) dt 
which is the probability that the mechanism will fail during the time interval 
(t, ¢ + dt). A generic curve of this function is plotted in Figure 2. It is seen to 









COMPONENTS EXHIBITING CUMULATIVE DAMAGE EFFECTS 
have a single maximum located at 


_1,.et2+ 0 
t = Glog tn 6 (19) 


The first moment and variance are easily found from Eq. (14), 


E() = 2 


o = E(/*) — E(t) = eae (20) 








It can be seen that the first moment, or mean time to failure, increases linearly 
with p, and decreases with \. These results are reasonable, since a large p means 
that the expected time between calls for the relay is small so that most of the 
time a small failure rate prevails. 
More generally, if Q(¢) is a gamma distribution of order n, 
2 n 
Q(t) = ml +rti+ wr fiero + on] , (21) 


n! 
the mean and variance of the time to failure are given by 


E(t) = 1 (1 _f. ey" a | 
p r 
a 4/( er" as 2( ey = > 
P= sli+f) -S(1+8) -3 (22) 


In the preceding analysis we have studied the reliability as a function of time, 
and found equations involving the reliability function, assuming that the relay 
is subject to constant surveillance, i.e., that one could detect a failure as soon 
as it occurs. A somewhat more realistic model can be studied in which a failure 
can be detected only when one attempts to operate the relay. It will be seen 
that the results for this model are somewhat simpler than for the preceding. 
The parameter of main interest is now the number of cycles to failure rather 
than the actual time to failure. 

In order to treat this problem, we consider each time of operation as a re- 
generation point for the process. Let 8 be the probability that the relay survives 
a single cycle. Then the expression for 6 is 


a= -f[ 20 dau. (23) 


We shall assume for simplicity that the relay operates initially at t = 0, although 
the extension is quite trivial. Then the probability of failure at cycle n is 


P, = p* "(1 — 8) (24) 


and the mean and variance may then be written 


E(n) = — 


Slide as cage 
a(n) = a —»p (25) 

















































































































418 GEORGE H. WEISS 


As an example, with the functions of Eq. (10) we find 


2 


En) =1+ = o*(n) = a - (26) 


MODEL OF A SYSTEM CAPABLE OF EXISTING IN A NUMBER OF STATES 


The second model which we propose to study consists of a system which is 
capable of being in one of a number of states of class A, or in a state of failure. 
If the system is in one of the states of the class A, it is considered to be operative. 
We specialize this treatment to the case where A consists of at most a 
denumerable set of states A, , A. , As, --- . We shall allow the system to make 
transitions between states A;A; at times ¢, which may or may not be random 
variables. If the system is in state A, we assume that it has a certain reliability 
function @,(t). We require R(t), the unconditional reliability function for the 
system. A large number of actual physical models are described by the class of 
models to be analyzed here. We mention, in particular the model of H.K. Weiss’, 
which postulates a number of faults in a system to be developed. Each of these 
faults may or may not be discovered. When a fault is discovered, it is corrected. 
The reliability function is determined by the number of faults still undetermined 
in the system. The theory is concerned with the estimation of the number of 
faults remaining in the system, from a study of the system failures. Here we 
shall concern ourselves with a study of the system reliability as a whole. In 
this particular example the class A would consist of states specified by the 
number of faults remaining in the device being developed. The second physical 
situation to which the following models apply pertain to the case of batteries 
which have failure characteristics which depend on the amount of time spent 
on the shelf as well as the conditions of operation when the batteries are actually 
installed. 
For the analysis we make the following definitions: 
¢,(t) dt: The probability that a single duration of state n will last between ¢ 
and ¢ + dt units of time. 

Q(t) =: The probability that the system will remain operative for ¢ or longer 
if it is in state n. 

w,(t) dt: The probability that an occurrence of state n begins at some time 
in (t, t + dt). 

Q,(t) : The probability that an occurrence of state n lasts for ¢ or longer; it 
is related to ¢,(t) by 


ay = f “ebeiade. (27) 


The probability of a failure in (¢, ¢ + dt) is —R’(t) dt, assuming the proper 
differentiability properties. An expression for this function is 


R(t) = > [ w,(7)Q,(t — r) Q(t — 7) dr (28) 


where ©,(¢) and Q,(t) are known, and it remains for us to find an expression for 
w,(t). Equation (28) can be derived by noting that the probability that a failure 
will take place from some state n at time ¢, is the probability that the occurrence 













COMPONENTS EXHIBITING CUMULATIVE DAMAGE EFFECTS 





419 


of state n began in some interval (7, 7 + dr) with probability w,(7) dr, did not 
conclude at any time in the interval ¢t — 7, with probability Q,(¢ — 7), and a 
failure occurred after T units of time where t — 7 < T < t — 7 + dt, with 
probability —R{(t — r) dt. Finally the breakdown can occur in any state, hence 
the summation. When the set A consists of N, a finite number of states, this 
relation becomes 


R(t) = z [ ald@it — )ae = dr + [ wx(t) Q(t — 2) dr (29) 


for the system makes no transition from the Nth state. We can write this equa- 
tion in the same form as Eq. (28) provided that we define Qy(t) = 1. The w,(t) 
satisfy the chain of relations 


w(t) = (2) 
aiiim [ edd ~ hl ~ ili. (30) 


Both Eqs. (28) and (30) are of the convolution form, which suggests that these 
relations are more simply stated in Laplace transform language. To simplify 
notation we introduce two (known) functions 


6,(t) = n(4) 2, (2) 
B,(t) = Q,(A) Q(2). (31) 
With our usual convention regarding Laplace transforms we find 
wx(8) = wr-s(s) O%-1(8) 


sR*(s) — 1 = 2 w*(s)B*(s) (32) 


The first of these relations is easily solved for w*(s), 


w*(s) = I 6*(s) (33) 
or 
RM) = = 1 + Ds I ono | (34) 


Equation (34) constitutes a formal solution to our problem, and the moments 
are easily obtained. 


Let us now apply these results to specific cases. In particular, we consider 
the case specified by 


Q,(é) = exp (—A,é)[1 + And] 

dn(t) = bn exp (—ypl). (35) 
From Eq. (35) we find 
A(t) = bn EXP [—(un + An)é(L + Ant) 
B,(t) = —Xat exp [—(Hn + dq)é]. 
































































































420 GEORGE H. WEISS 
Thus we have 
wi(s) = 1 


1 (e+ Dd, + Hj 
ot = TT" fe Pay eee 


. -1/ _F rv pet + 9). . 
R*(s) = 5 | 1 be rigger  @4h Far (37) 


A special case that can be worked in detail is the two state system A = (A,, A2). 
We now have 


ae Hi Ais 
wis) (wu: + Ai + 8) + (m1 + AL + 8)” 


The use of Eq. (34) leads to an expression for the probability density for failures. 


wi(s) = 1, (38) 


hone 


ip = Ne + 
; * Ai + wi — Az 


fl — grrr 7 


Aimir2 


ected ee ipa a ee op (Aa tHamAadt - 
ta (1 + Qa + wi — 2)d)] 


A +m +A, 


= Nite + rte + A +m =A; (39) 


Aidopit™ pase 
2 


Tue GENERAL Markorr MopDEL 


The preceding formal analysis can be generalized to describe a situation in 
which the system can make transitions between any two states. An application 
of this model would be to a system which may have parts replaced during 
operation. 

For simplicity we will consider only the case of a finite number of states. 
We define P;, to be the probability that an occurrence of state k is followed 
by an occurrence of state j7. We define P to be a matrix whose (j, k) element 
is P;, . We define vectors W(t), A, , and Ay by 


1 
w, (2) 0 
W(t) = wx(t)| ? A, = 0 
wy(t) 
and the matrices @(¢) and B(t) by 
6, (2) 
6,(t) 
0 


6(t) = 









COMPONENTS EXHIBITING CUMULATIVE DAMAGE EFFECTS 
8, (2) 


B(t) = ml (41) 
0 


L By(2)) 


where the functions 6;(¢) and 8;(é) are defined by Eq. (31). We assume for 
simplicity that an occurrence of state 1 always takes place at ¢ = 0. In this 
matrix notation the relation between R’(t) and W(¢) is 


















R'(i) = [ " Ky-B(t — 1)W(1) dr (42) 

while the matrix function W(¢) is found from the equation 
W(t) = A, a(f) + [ " P.0(t — 1)W(1) dr (43) 
where 4(#) is a delta function. If we adopt the convention that the Laplace 
transform of a matrix is the matrix of the Laplace transforms of its elements, 


and we denote the Laplace transform of a matrix by the same matrix with an 
asterisk and an argument s, Eqs. (42) and (43) lead to 


W*(s) = [I — Pe*(s)] "A, 


R*(s) 


- (1 + AyB*(s)W*(s)} 


1st 4 A,\B*@[ — Pe*(s)J"A,}. (44) 


§ 








As an example of the preceding analysis we cite the result for the two state 
system described by a transition matrix 


Pp = ’ i (45) 
1 0 











R,(é) = exp (—A,#) 


¢,(#) = yw, exp (—pnl). (46) 
The matrix W*(s) is found to be 





=f 1 
ee gl acc em i SR as hae 
oe [1 Gti. + nde + he + 5 i =) 
startin 







which leads to an expression for R*(s) 


Wie iy: scenes a a 
s (s + Ar + wi)(S + Az + Me) — Mabe (48) 


It is rather tedious to invert this transform but the mean time to failure and 


422 GEORGE H. WEISS 
the variance are given by 


Ao + wi + be 


OO * 55 a ae ek 


hs Ao + mi + oe + Dodo + ure + Qiu 
(ArAz + MiA2 + Med)” 


(49) 


SUMMARY 


In this paper we have analyzed several models of systems which include 
hereditary or environmental effects. In each case the problem was the same, 
viz., given the conditional reliability functions, to find the unconditional reli- 
ability function. 

The first model discussed was one which involved hereditary effects by postu- 
lating that the conditional reliability functions depend on the times of operation 
of the mechanism. In order to render this model tractable we assumed in fact 
that only the last time of operation entered into the expression for the conditional 
reliability function, and that this function at any time ¢ depended only on the 
difference between the two times. We were able to write an integral equation 
for the reliability function. In the event that the times of operation followed 
a Poisson distribution we were able to derive specific results since the integral 
equation reduced to one of the convolution type and could then be solved by 
Laplace transform techniques. We solved for the unconditional reliability func- 
tion both as a function of time and as a function of the number of operating 
cycles. In the latter case the failure distribution was found to be a simple geo- 
metric one. 

These models are related to semi-Markoff processes which have recently 
been studied in connection with reliability*, Geiger counter problems’, and 
inventory problems’. 


ACKNOWLEDGEMENT 


I would like to thank the referees for spotting a serious error which occurred 
in an earlier draft. 


BIBLIOGRAPHY 


1. H. K. Weiss, Estimation of Reliability Growth in a Complex System with a Poisson- 
Type Failure, J. Opns. Res. 4, 532, (1956). 
2. G. H. Weiss, On the Theory of Replacement of Machinery with a Random Failure Time, 
Naval Res. Log. Quart. 3, 279, (1956). 
3. G. H. Weiss, NAVORD Report 4351, On Semi-Markovian Processes and an Application 
to Reliability Theory, U. S. Naval Ordnance Laboratory, White Oak, Md. 
. R. Pyxe, On Renewal Processes Related to Type I and Type II Counter Models, Ann. 
Math. Stat., 29, 737, (1958). 
. R. Pyke, Markovian Renewal Processes with Finitely Many States, CU-11-59 Nonr- 
266(59) (to be published). 





Voi. 3, No. 3 TECHNOMETRICS Aucust, 1961 


An Analysis of Some Relay Failure Data from 
a Composite Exponential Population 


R. R. Prarie and B. OstLe 


North Carolina State College! and Arizona State University? 


This paper presents an analysis of some data obtained from a life test on a group 
of relays. Observation of the data indicated that the fitting of the common failure 
distribution, the one parameter exponential, was not appropriate. As a result a 
model was used which assumes that the hazard rate, (x), is a step function. The 
method of maximum likelihood is used to estimate the different hazard rates and 
the points at which these rates change. 


INTRODUCTION 


In many practical life test situations more than one failure mode may be 
operating. An example of this is the case where the units under test are subject 
to a high early hazard rate followed by a lower hazard rate which persists after 
the weaker units have failed. 

Several references on heterogenous failure populations exist, e.g., Epstein 
(1953), Madison (1955), Mendenhall and Hader (1958), and Miller (1960). 
Of these, the one germain to the present study is the work of Miller. He con- 
sidered the case where there are two groups of units, one group representative 
of the general quality of the product while the second may be classed as early 
failures. The mathematical model considered by Miller assumes an early hazard 
rate \, until time 7) after which a lower hazard rate d, persists until all the 
units have failed or testing is terminated at time 7, . Estimates of the popula- 
tion parameters (i.e., of the \’s) were obtained for situations where 7’, is known 
and where 7, is known only within a specified interval. 

In the application reported in the present paper, it was necessary to extend 
the method of Miller to the case of three hazard rates. That is, we must contend 
with a hazard rate , until a point 7, , a second hazard rate \, from 7, to 7; , 
and a third hazard rate \; from 7’, until the test is censored at the point 7, . In 
addition to the simple extension from two to three hazard rates, the approach 
was modified to yield estimates of the two points at which the hazard rate 
changes value. All parameters (A; , A2 , As , 7’) and 7',) are estimated using the 
method of maximum likelihood. 


DESCRIPTION OF THE Lire TEST ON RELAYS 


One hundred relays were subjected to a life test of 250,000 cycles 
(“operations”), or to failure, whichever occurred first. The relay coils were 
actuated with 28.0 + 0.5 vde with equal “on” and “‘off” periods and the cycling 


*2 Formerly with Sandia Corporation, Albuquerque, New Mexico. 


423 





R. R. PRAIRIE AND B. OSTLE 


FAILURE RATE 


50,000 100, 000 150, 000 
NUMBER OF OPERATIONS 


Figure 1—Failure rate for data in Table 1. 


rate was 8 cycles per second. The contacts were made to “make” and ‘‘break’’ 
a 200 + 20 ma resistance current from a 6.0 + 0.5 vde source. The contact 
loads were monitored continuously during the test to determine when a failure 
occurred. Any mating contacts failing to ‘‘make’’ once during the test consti- 
tuted a failure. Sixty-six relays failed before 250,000 cycles (see Table 1), thirty- 
three relays survived 250,000 cycles, and information on one relay was lost. 


ANALYSIS 


The first step in the analysis was the plotting of the data. In Figure 1, the 


failure rate is plotted over the range from 1 to 250,000 cycles. In Figure 2, 
the proportion still surviving is shown as a function of the number of cycles. 
Because of the importance of obtaining resolution in the range from zero to 
50,000 cycles two different scales were used. The scale in the interval zero to 
50,000 cycles is expanded over the scale used between 50,000 and 250,000 cycles. 


TABLE 1 
List of cycle numbers at which failure occurred 


7,098 25,005 102,322 214,196 
8,290 28,704 106,076 215,934 
9,784 31,153 107,912 216,092 
13,071 32,145 115,676 219,942 
14,794 34,000 118,483 241,137 
15,808 34,453 118,888 247,479 
16,273 38,757 127,903 
16,549 51,511 138,747 
16,803 56,551 144,709 
19,337 69,606 167,684 
20,281 78,594 170,153 
20,823 80,283 187,717 
21,555 80,989 196,134 
23,044 87,747 208,115 
24,468 98,509 210,221 


PROPORTION SURVIVING 





ANALYSIS OF SOME RELAY FAILURE DATA 


24,000 32,000 40,000 48,000 
NUMBER OF OPERATIONS 
—— THREE PARAMETER EXPONENTIAL 


~-~-ONE PARAMETER EXPONENTIAL 
A = 00000510 


PROPORTION SURVIVING 


100, 000 
NUMBER OF OPERATIONS 


Figure 2—Proportion surviving and the fitted survival functions based on the data in Table 1. 


Observation of the two figures plus a priori information indicated that use of 


the most common life distribution, the single parameter exponential, was in- 
appropriate. As a result, a model of the type proposed by Miller was adopted. 


The model used is one which assumes that the hazard rate, d(x), is a step 
function of the form 


Ai 4.259 7% 
NE) = Az ré <2 < Ty 
A3 z>T, 


From these assumptions regarding (x), the discrete probability function for 


A® 


Figure 3—Graphical presentation of the hazard rate. 





426 R. R. PRAIRIE AND B. OSTLE 


the number of cycles to failure is approximated by the continuous probability 
density function 


fiz) = Ax exp {—A,7}, 0<2<Tp 
fo(z) = Az exp {—AT, — A(x — To}, To <2x<T, (1) 
f(x) = As exp {—A,T, — AAT, — Tr.) — A(x — Ti}, a> Ti. 


In order to obtain estimates it was necessary to assume that 7, and T; could 
each be designated within some fixed interval, that is, 7,4 < Ty) < Ts , and 
Tc < T, < Tp. This situation is depicted in Figure 3. The pattern of the n 
observations associated with failures occurring before TJ, cycles can be 
represented as 

OS % ,2%,°*°° tS Tas 


Ts < Ur41 » Vrt2 °°? > Xs < Ts ’ 
Ts < Be41 9 Us42 5 °° * 9 Uy < Tc ’ 
Tc < Lusi » Tut2 » °°* 9 Ly < Tp ’ 


and 
Ta Shes: (Mattie? * eRe SE Tes 


The remaining N — n relays were still operating at the time the test was censored. 

Since we are not certain how many of the s — r observations between 7’, and 
T, are associated with f,(x) and f.(x), respectively, these observations will be 
used only as attributes information. A similar statement applies to the v — u 
observations between 7’, and 7'p . The likelihood function, therefore, is given by 


L=| Tne. [P\T, <2 <Ts}] ae Ce) | 


=s+1 


(P{Te <2 <Tp})"" -| II led | {x > T.})""* 


k=v+1 


a a; exp {—h > x} |-fexp {—ATa} — exp {[—AT. — AATs — To)}]"" 


t=1 


Jar exp {0 — 8)MT, — rz > (x; — rs 


i=st+l 


‘(exp {—A.To—A2(Tc—To)} — exp {—A,To—A2(T1—T) —As(T0—T) }]" 


ag exp {—(n — v)AT) — (n — v)dA(T, — To) — rs > (u% — | 


k=v+1 
-‘[exp {—A.T>) — AAT1 — To) — A(T — Tir 


Denoting the logarithm of the likelihood by In L = Q, the next step is to solve 
the equations 

oQ _ 

Or; 

9Q 

oT; 















ANALYSIS OF SOME RELAY FAILURE DATA 427 


for A; , A2, As, 7) and 7’, . These approximate maximum likelihood solutions are 


nar/(Sa+—nn), 


t=1 


(u -9/| 5 2, — (N — OF, + (N - wre], 


w= a-»/| > n— W— > + — mr.) 


k=v+1 


Ae 


T. = [In [(N — n)/(N — 8)] + Ta — UT s)/(A. — 0), 
and 
T, = [In [((N — w/(N — 0)] + XT cc — NeT’v]/(A2 — 4s). 


In order to determine the values of T, , Ts , Tc , and Tp , which in turn de- 
termine r, s, u, and v, Figures 1 and 2 along with a priori information were 
studied. On the basis of information gained from these two sources,’ the values 
of T,, Ts, Tc , and Tp were taken as 


T, = 1,500, T, = 5,000, Tc = 25,000, Ty = 35,000 


which, from the data, gave r = 7, s = 12, u = 30, and v = 36. Computations 
yielded the estimates of the parameters 


A, = .00004943, i, = .00001158, 4; = .000002923, 
T, = 1908, 1, = 32120. 
The survival function, given by 
exp {—),2} 0<2<T, 
S(z) = exp {—A, 7) — A(x — T,)} To< 2ST, 
exp {—\,T, — A(T, — To.) — A(x — T))}, a> fT, 
is estimated by 
exp { —.000049437} 0<2z < 1903 
S(a) = exp {—.07202 — .000011582} 1903 < « < 32120 
exp {—.35015 — .000002923z} x > 32120. 


This function was then plotted on Figure 2 for comparison with the observed 
data. For comparative purposes, the single parameter exponential was fitted 
to the data. The graph of the survival function for the single parameter ex- 
ponential is also shown in Figure 2. It is apparent that use of the single param- 
eter exponential would have lead to erroneous conclusions regarding the prob- 
ability of survival up to any specific cycle. 


SUMMARY 


A density function involving three different hazard rates with unknown 
points of transition (i.e., unknown points at which the hazard rate changes) 













































































































428 R. R. PRAIRIE AND B. OSTLE 


has been considered. Estimates of the five unknown parameters were obtained 
by maximum likelihood. A generalization of these results to situations involving 
more than three hazard rates and two points of transition is straightforward. 

The method has been tried, with good results, on a set of observed data. 
Since situations of the type described in the example must occur quite frequently, 
it appears that the method of analysis presented here may find application in 
a wide range of applied problems. 


REFERENCES 


. Epstern, B., Statistical Problems in Life Testing. Proceedings of the Seventh Annual Con- 
vention, American Sociely for Quality Control, pp. 385-398, 1953. 

. Maprson, R. L., Applications of Statistical Methods in Evaluating Performance of Elec- 
tronic Equipment. Proceedings of the Ninth Annual Convention, American Society for 
Quality Control, pp. 209-217, 1955. 

. MENDENHALL, W., and Haper, R. J., Estimation of Parameters of Mixed Exponentially 
Distributed Failure Time Distributions From Censored Life Test Data. Biometrika, 
Vol. 45, pp. 504-520, 1958. 

. Miller, R. G., Early Failures in Life Testing. Journal of the American Statistical Association, 
Vol. 55, No. 291, pp. 491-502, September, 1960. 

















Vo. 3, No. 3 TECHNOMETRICS Aucust, 1961 


Applications of Truncated Distributions in 
Process Start-ups and Inventory Control 


H. Smita Aanp D. W. Grace 
The Procter and Gamble Company 










Many practical problems arise in which the study of an asymmetric range of 
values of some variable is important. This paper treats two applications in which 


this kind of study is made. The first is a process start-up problem, and the other 
concerns inventory control. 
















Process Start-Ups, Typs A 


In many processing start-ups in the food and chemical industries, the quality 
of the product varies until production conditions become stabilized. Consider 
the case in which the scrapping of off-quality material results in the destruction 
of the containers in which it is packed. This is particularly true in the food 
industry where a can is considered contaminated once it has been filled. Intro- 
duction of defective cans into the packing line to receive the initial output of 
the system is often the only way to avoid the costly destruction of acceptable 
cans. Of course, all such defective cans must be scrapped whether or not they 
contain off-quality product. 

The problem which faces the production man is, ‘How many defective cans 
should be used at start-up to minimize the total scrapping cost?’ This question 
can be answered in the following manner: 

Let n = number of defective cans used at start-up, 

c, = cost of scrapping and reworking a can of product, 
and c. = cost of scrapping and reworking a can of product, plus the cost 
of a new can. (c2 > ¢,) 
Thus, the total cost of scrap at start-up on a particular day is: 
T.C. = cen + c.(x — n) (when x > n), 
where x is the number of cans of off-quality product, and n is the number of 
defective cans used. 


The expected value is given by: 














E(T.C.) = en + [E(x | x > n) — nj[Pr (x > n)] 


Assuming x to have a continuous density function, f(z), the above expression 
reduces to: 


E(T.C.) =en +e, / (e — n)f(2) de. 
The value of n which minimizes E(T.C.) is obtained by differentiating the 


above expression with respect to n, equating to zero, and solving for n. Thereby, 
429 


H. SMITH AND D. W. GRACE 


f(x) 


Pa al Truncated Curve 


Complete Curve 


(o 


>_> X 
Oo » n 


Ficure 1—Simultaneous plot of truncated and complete distributions. 


we obtain 
¢,/c2 = 1 — F(n) 


where F(x) is the cumulative distribution function of z. 

Case 1. Assume the distribution of scrapped material to be N(u, oc). Then, n 
must be chosen so that (n — u)/o cuts off an area of c,/c, from the normal curve. 
Example. Suppose the cost of scrapping and reworking a can of product is 
$.04 and new cans cost $.08. Prior experience indicates off-quality product to be 
N(u = 100 cans, ¢ = 40 cans). 

Given: c, = .04; ce = .12;.c,/c. = 4. Then, 


j= fag y f(a) dx = 1/3. 


Thus, n = 117 defective cans. Other values of c,/c. yield the following values 
of n: 


C;/C2 . . . . . . . . 9 1.0 


151 134 121 110 100 90 79 66 49 0. 


Case 2. Assume the distribution of scrapped material to be truncated normal, 
with the truncation point at zero. Then, the mean and variance of the underlying 
normal distribution can be estimated using Cohen’s method. (1) 

For left truncation, the true mean, y, is smaller than X and the true variance, 
o’, is larger than S’. Cohen gives the following estimates: 


é _ Ss’ + o(x = x,)’ 
a= X — o(X — X,), 


where X, is the truncation point, and @ is an auxiliary function which is tabulated 
and plotted in the reference article. 


» - 2 
Zé) —é 
where ¢ is the standardized truncation point and Z = ¥/(t)/1 — ¢(é), the ordinate 
over the area. 








APPLICATIONS OF TRUNCATED DISTRIBUTIONS 431 


















Referring to Figure 1, the tail area cut off by (n — u)/o in the complete, or 
non-truncated, curve is smaller than the corresponding tail area of the truncated 
curve by a factor of [1 — ¢(0)], where ¢(0) is the area of the tail below X = 0. 

In solving for n, we want to cut off an area from the truncated curve equal 
to c,/c2 . This corresponds to an area of [1 — ¢(0)]c,/c2 under the complete curve. 
Therefore, to find n, we have only to multiply c,/c. by 1 — ¢(0) and find the 
value of n for which n — u/o cuts off that tail area from the complete curve. 
Example. Taking a random sample of size n = 100 from the truncated popula- 
tion, suppose X = 100 and S’ = 1600. Using Cohen’s equations 


a= X — (X — X,) 

= 100 — .00906(100) = 99.1, 
and 6° = S’ + .00906(100)’ = 1691, 
é = 41.12, 








99.1 


also, 1 — ¢(0) = .992 since 41.12 





= 2.41. 











Tail area 


of complete .0992 .1984 .2976 .3968 .4960 .5952 .6944 .7936 .8928 .992 
curve 





New value 
for n 








Process Start-Ups, Type B 


Another type of process startup is one in which most of the process corrections 
can be made during the production of the first few items. The frequency of 
defective product on daily startups can be satisfactorily approximated by the 
exponential distribution, i.e., 






fa) =e" (@ > 0). 






The problem is essentially the same as before with the Expected Total Cost as 
E(LC.) = ne, +e, f (ew — n)f(a) de. 


Differentiating the E (T.C.) with respect to n, we have the solution, 


n = p(Inc, — Ine) 


432 H. SMITH AND D. W. GRACE 


where » = mean of the underlying population. y» is usually known by experience; 
if it is not, it must be estimated through a sampling procedure. 
Example: 


Given: c¢, = .04,c, = .12, un = 6 items 
then: n = 6[In .12 — In .04] = 6[In 3] = 6[1.098] 
or n = 6.588 or 7 items. 


Note: when c; > 2.72¢,,n < up. 


INVENTORY CONTROL 


A similar development is used in the inventory control application, where c, 
is the unit cost of keeping the inventory and c, is the unit cost of running out. 
Most managers are reluctant to put a price tag on running out of stock if cus- 
tomer service is involved. However, decisions must be made regarding inventory 
level and it sometimes is useful to look at these decisions in retrospect and 
determine the implied cost of runout. 

The problem then becomes, ‘‘What is the implied unit cost of running out 
of stock, inherent in the decision to keep n units of inventory.”’ 

The expected total cost for a given time period is: 


2 expected no. of — a no. of units ) 
nr a the time period / \short when runout occurs 


Using the same development as before, we find: 


where (n) = [ 2 f(x) dz. 


2-1 — 6m) 


In words, this result means that the implied unit cost of runout, c, , inherent 
in the decision to keep an inventory of n units, is determined by dividing the 
unit cost of carrying inventory, c, , by the area of the tail of the normal curve 
above the point (n — y)/c. 

Example: Suppose the safety (or buffer) inventory is designed to protect sieinn 
running out of stock during a “change time” of two weeks. That is, regular 
production satisfies regular shipments and it takes two weeks to alter the pro- 
duction level in the event demand increases. 

Suppose, further, that the cost of carrying inventory is $.01/case/2 weeks. 
If bi-weekly demand is N (100,000; « = 10,000) and the decision has been made 
to maintain a safety stock of 20,000 cases, (2c), what is the implied unit cost 
of running out? 

0228 = $.44 per case. 
Thus the manager is provided with a numerical basis upon which to review 
his inventory policies. 


REFERENCE 


(1) Conen, A. C., Jn., “Simplified Estimators for the Normal Distribution When Samples 
Are Singly Censored or Truncated,’’ Technometrics, Vol. 1, No. 3, August, 1959. 


‘_ cs Qo wo SHH mm 4 4 


So fs fe: 





Vor. 3, No. 3 TECHNOMETRICS Aucust, 1961 


Kstimating the Poisson Parameter from Samples 


That Are Truncated on the Right* 


A, Cuirrorp CoHEN, Jr. 
The University of Georgia 


1. INTRODUCTION 


Estimation of the Poisson parameter from various types of truncated and 
censored samples was studied in an earlier paper by the writer [1]. More recently, 
considerable attention [2], [3], [4], [5], [7], [8], [9], [11] has been directed to the 
special case in which zero values of the random variable are missing. In this 
paper we are concerned with samples that are truncated on the right at a termi- 
nus which we designate as d. In order that any given member of a population 
might be included in a sample of the type under consideration, it is necessary 
that « < d. Population members for which x > d are restricted from observation. 
Manufacturing processes in which x, the number of defects per item, is a Poisson 
distribution random variable often give rise to samples of this type. It is not 
uncommon to find an inspection being performed which results in rejecting all 
items for which x > d, where here d is the maximum number of defects per- 
missible per item. Samples selected from the screened (accepted) production 
are accordingly truncated on the right at d. 


2. MaximuM LIKELIHOOD ESTIMATION 
The conditional Poisson probability function applicable here may be written as 


cx 
p(x) or x! [F(d)] ’ rs 0, R's#s d; A> 0, (1) 
where F(d) is the cumulative Poisson function 
d a 
Me) « 7. (2) 


z=0 


The maximum likelihood estimating equation for a sample consisting of n 
observations of random variable x with probability function (1) was derived 
in’[1], and with slight changes in notation may be expressed as 


= = Fd — 1)/F@), (3) 
where # is the sample mean 
#= D2,/n. (4) 
i=1 
‘ In [1], the solution of (3) was accomplished through a trial and error inter- 
* Sponsored by the Office of Ordnance Research, U. S. Army. 
433 





434 A. CLIFFORD COHEN, JR. 


polative procedure carried out with the aid of Molina’s Tables [6]. The solution 
of this estimating equation is greatly simplified with the aid of the accompany- 
ing tables which give # as a function of \ and d. In practical applications, with 
d specified and with @ known from a given sample, the maximum likelihood 
estimate i can be readily obtained by inverse linear interpolation between the 
pair of successive table entries which brackets the sample value, ¢. This estimate 
is of course consistent and asymptotically efficient although it might be biased. 
The possible effect of bias in samples of the type under consideration has not 
yet been fully investigated, but it seems unlikely to be troublesome except when 
n is small. For samples of size 50 or larger where truncation has resulted in 
eliminating not more than twenty or perhaps twenty-five percent of the original 
distribution, it appears safe to neglect bias as a potential source of estimate error. 

It is interesting to note that if the expression given in (2) for F(d) is substituted 
into (3), the result may be simplified to the following polynomial equation of 
degree d in d 


d 
3, N‘[ePasi-i oe aPa-i] = 0, (5) 


t=0 
where ,P, is the usual permutation symbol 
PF =nV/na—r!, rin; FP =0, rn. (6) 


The required estimate i is the positive root of (5). Descartes’ “rule of signs” 
enables us to establish the fact that (5) has exactly one positive root inasmuch as 
the coefficients of \* exhibit but one change of sign. The constant term (the coeffi- 
cient of \°) is —d!z. The coefficient of \“ is (d — Z), which is positive or zero since 
d is the maximum value of x and therefore d > #. For d > , this establishes the 
fact that we have at least one change of sign. 

If we let i, be the smallest value of 7 for which the coefficient of \* is positive, 
then d > it) > 1, and [¢Pasi1-;, — %2Pa-:,] > 0. Since gPasi-; > «Pa-; except 
for i = 0, it follows that [¢Pa.1:-; — @:Pa_;] > 0, foralld >i > %. 

Accordingly there is but one change of sign, and therefore exactly one positive 
root for the case where d > #. When d = @, which can happen only if each 
sample observation has the value d, (5) reduces to a polynomial equation of 
degree d — 1 in \ with no positive root, and in this case, the maximum likeli- 
hood estimate 4 fails to exist. 

When it is necessary to have a more precise estimate of \ than can be obtained 
by interpolation from the accompanying tables, any one of various standard 
iterative methods for determining the real roots of polynomial equations might 
be employed in solving for the positive root of equation (5) to as many significant 
digits as desired, using the interpolated value of i as a first approximation. 


3. VARIANCE OF THE EsTIMATE 
The asymptotic variance of \ may be expressed as 
VA) = —[E(@’L an’)]"’, (7) 


where E symbolizes expected value, and L designates the logarithm of the 
likelihood function. 





ESTIMATING THE POISSON PARAMETER 
Table 1, MAXIMUM LIKELIHOOD ESTIMATION FUNCTION X = A[F(d-1)/F(d) ]. 


WN -005 -O1 -02 -03 -04 -05 -06 -07 -08 


-00498 -00990 01961 -02913 04762 -05660 - 06542 .07407 
-02999 04994 05990 - 06984 -07976 
-07999 


ol +2 +3 4 


xz 


-09091 - 16667 +23077 . 28571 
-09955 19672 - 28996 37838 
-09998 -19978 -29900 -39714 
-19999 -29993 . 39971 

- 39998 


@Q2ra vrkwnre 


x 


1.1 1.2 1.3 . 1.5 1.6 1.7 1.8 1.9 2.0 


- 52381 54545 56522 -60000 -61538 -62963 -64286 -65517 - 66667 
85397 90411 -95071 1.03448 1.07216 1.10736 -14027 1.17109 1.20000 
1.01663 1.09227 1.16443 1.29851 -36061 1,41957 -47552 1.52860 1.57895 
1.07754 1.16853 1.25738 1.42806 -50965 1.58862 -66494 1.73857 1.80952 
1.09508 1.19249 1.28901 1.47873 57160 1.66296 -75266 1.84058 1.92661 


1.09910 19850 1.29762 1.49470 -59246 1.68957 - 78591 -88137 1.97583 
1.09986 -19974 1.29956 1.49887 -59828 1.69747 - 79638 -89496 1.99312 
1.09998 -19996 1.29993 1.49979 1.59966 1.69946 -79919 -89880 1.99828 
-19999 1,29999 1.49996 1.59994 1,69990 - 79984 -89975 1.99962 

1.49999 1.59999 1.69998 - 79997 -89995 1.99992 


-89999 1.99999 


RRR ee Re Re 


~ 
KF COONA VeWde 


~ 


2.1 2.2 . 2.5 2.6 2.7 2.8 2.9 3.0 


a 


-67742 -68750 - 70588 -71429 -72222 -72973 - 73684 -74359 -75000 
1.22714 1.25267 1.29936 - 32075 -34098 1.36011 1.37824 1.39543 1.41176 
1.62671 1.67202 1.75587 -79458 -83141 1.86641 1.89969 1.93137 1.96154 
1.87781 1.94347 2.06711 -12521 -18094 2.23436 2.28557 2.33465 2.38168 
2.01065 2.09264 2.25019 - 32567 -39894 2.46998 2.53879 2.60541 2.66984 


2.06919 2.16132 2.34153 42941 -51570 2.60031 2.68319 -76428 2.84353 
2.09080 2.18791 . 2.38012 -47504 -56906 2.66209 2.75404 -84484 2.93441 
2.09759 2.19668 . 2.39405 -49223 -58998 2.68727 2.78401 -88014 2.97560 
2.09944 2.19919 ° 2.39841 -49784 -59711 2.69619 2.79503 -89362 2.99189 
2.09988 2.19982 . 2.39962 -49946 -59925 2.69897 2.79861 -89815 2.99757 


2.09998 2.19996 . 2.39992 -49988 -59982 2.69975 2.79965 -89951 2.99934 
2.19999 . 2.39998 -49997 -59996 2.69994 2.79992 2.99983 

-59999 2.69999 2.79998 . 2.99996 
2.99999 


OB gerewndre 


NN NNNNN NNKE 


1 
1 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 


3.1 3.2 3.3 ° 3.5 3.6 3.7 3.8 3.9 4.0 


-75610 -76190 - 76744 . 77778 - 78261 - 78723 - 79167 -79592 -80000 
1.42729 1.44206 1.45613 -48235 1.49459 1.50628 1.51747 1.52819 1.53846 
1.99028 2.01768 2.04382 -09262 2.11540 2.13720 2.15806 2.17804 2.19718 
2.42675 2.46994 2.51133 -58905 2.62553 2.66053 2.69411 2.72634 2.75728 
2.73212 2.79229 2.85040 -96061 3.01282 3.06318 3.11175 3.15858 3.20373 


2.92091 2.99639 3.06996 -21131 3.27910 3.34498 3.40896 3.47108 3.53135 
3.02267 3.10955 3.19500 -36137 3.44220 3.52140 3.59895 3.67483 - 74900 
3.07032 3.16423 3.25725 -44038 3.53036 3.61920 3.70684 3.79323 - 87832 
3.08981 3.18733 3.28440 -47697 3.57236 3.66708 3.76107 3.85428 - 94664 
3.09684 3.19595 3.29486 -49196 3.59008 3.68786 3.78526 3.88225 97877 


3.09911 3.19882 3.29846 -49744 3.59676 3.69592 3.79492 3.89372 -99229 
3.09977 3.19969 3.29958 -49925 3.59903 3.69874 3.79839 3.89796 - 99743 
3.09995 3.19992 3.29989 -49980 3.59973 3.69964 3.79953 3.89939 -99921 
3.09999 3.19998 3.29997 -49995 3.59993 3.69991 3.79987 3.89983 -99977 

3.29999 -49999 3.59998 3.69998 3.79997 3.89996 -99994 


3.69999 3.79999 3.89999 -99999 


OW WWWWwW NNNre 
WWWWW WWWWwW NNNre 


WO WWKWWW WHWUWWw 


It follows from (7) and from the expression given in [1] for 0°L/a” that 


Va) ~* yao), @) 


a 
vA) = F(d — 1)[F@ + M@] — 9F@fd — 1)’ 





A. CLIFFORD COHEN, JR. 


Table 1-(Continued) 


a 


4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 


- 80392 -80769 -81132 -81481 -81818 -82143 - 82456 - 82759 -83051 
1.54832 1.55777 1.56687 1.57559 1.58400 1.59209 1.59987 1.60739 1.61463 
2.21555 2.23316 2.25008 2.26633 2.28195 2.29697 2.31143 2.32535 2.33877 
2.78700 2.81556 2.84301 2.86941 2.89480 2.91923 2.94276 2.96541 2.98724 
3.24727 3.28925 3.32972 3.36876 3.40640 3.44272 3.47775 3.51155 3.54417 


3.58981 3.64649 3.70143 3.75467 3.80625 3.85621 3.90460 3.95146 3.99683 
3.82147 3.89223 3.96127 4.02860 4.09423 4.15817 4.22044 4.28105 4.34003 
3.96206 4.04441 4.12533 4.20479 4.28277 4.35925 4.43419 4.50760 4.57946 
4.03811 4.12862 4.21814 4.30659 4.39395 4.48015 4.56517 4.64896 4.73148 
4.07478 4.17023 4.26508 4.35928 4.45278 4.54552 4.63747 4.72858 4.81880 


4.09062 4.18867 4.28639 4.38377 4.48076 4.57733 4.67344 4.76904 4.86409 
4.09680 -19604 4.29513 4.39406 4.49280 4.59133 4.68962 4.78765 4.88538 
4.09899 -19872 4.29839 4.39799 4.49751 4.59693 4.69625 4.79544 4.89450 
4.09970 -19962 4.29951 4.39937 4.49920 4.59899 4.69874 4.79844 4.89807 
4.09992 -19989 4.29986 4.39981 4.49976 4.59969 4.69961 4.79950 4.89937 


4.09998 -19997 4.29996 4.39995 4.49993 4.59991 4.69988 4.79985 4.89981 
4.19999 4.29999 4.39999 4.49998 4.59998 4.69997 4.79996 4.89994 
4.59999 4.69999 4.79999 4.89998 


CON werwndre 


aL Lea BLL OSHS WWNHE 


5.5 6.0 6.5 ° . 9.0 10.0 11.0 12.0 


-84615 85714 - 86667 ° - 88889 + 90000 - 90909 -91667 -92308 
1.65316 1.68001 1.70306 1.75611 1.78195 1.80211 1.82165 1,83027 
2.40527 2.45902 2.50099 2.59629 2.64230 2.67932 2.70974 - 73543 
3.10291 3.18261 3.25040 3.40290 3.47576 3.53344 3.58020 -61800 
3.71768 3.83760 3.93959 ° 4.16793 4.27585 4.36048 4.42837 -48376 


4.24038 4.41047 4.55585 4.88198 5.03535 5.15483 5.24988 - 32689 
4.66123 4.88967 5.08713 . 5.53468 5.74574 5.90960 6.03919 -14343 
4.97807 5.26874 5.52435 . 6.11544 6.39758 6.61682 6.78969 -92814 
5.19853 5.54913 5.86428 . 6.61487 6.98130 7.26792 7.49415 -67489 
5.33904 5.74115 6.11148 . 7.02671 7.48833 7.85418 8.14462 - 37689 


5.42068 5.86205 6.27825 . 7.34969 7.91261 8.36768 8.73307 -02681 
5.46388 5.93181 6.38207 . 7.58875 8.25222 8.80261 9.25216 -61719 
5.48476 5.96869 6.44156 ° 7.75468 8.51046 9.15661 9.69634 10.1412 
5.49402 5.98661 6.47298 . 7.86223 8.69593 9.43181 10.0630 10.5935 
5.49781 5.99465 6.48831 . 7.92719 8.82118 9.63503 10.3532 10.9712 


5.4€925 5.99799 6.49526 . 7.96376 8.90053 9.77698 10.5726 11.2750 
5.49976 5.99929 6.49819 6. 7.98298 8.94764 9.87051 10.7302 11.5092 
5.49993 5.99976 6.49935 . 7.99244 8.97390 9.92858 10.8376 11.6815 
5.49998 5.99993 6.49978 .6. 7.99682 8.98765 9.96255 10,9068 11.8021 
5.49999 5.99998 6.49993 . 7.99873 8.99445 9.93131 10.9490 11.8825 


5.99999 6.49998 . -7,99952 8.99762 9.99111 10.9733 11.9332 
6.49999 . 7.99982 8.99903 9.99596 10.9867 11.9637 

7.99994 8.99962 9.99824 10.9936 11.9811 

7.99998 8.99986 9.99927 -9971 11.9905 

7.99999 8.99995 9.99971 10.9987 11.9955 


8.99998 9.99989 -9995 11.9979 
8.99999 9.99996 -9998 11.9991 
9.99999 -9999 11.9996 

11.9998 

11.9999 


COND wewne 
OO BIBHY Bwhw 
SNYHH BwoNneY 
| -- 
mow @-) Gok kh WwW WH Dee 


en? 
f (x) — x! : 
For the special case in which d = 1, (9) may be simplified to yield 


vi) = A+ 1’. (10) 


In order to facilitate the computation of variances in practical applications, 
selected entries of ¥,(A) are included in Table 2. 


4. An ILLUSTRATIVE EXAMPLE 


To illustrate the practical application of these results, we consider a sample 
of Rutherford and Geiger [10], consisting of observations of the number of « 
particles observed during time intervals of one-eight second duration modified 
to exclude all records of intervals during which the number of particles exceeded 





ESTIMATING THE POISSON PARAMETER 


Table 2. THE VARIANCE FUNCTION Fy) = [F (a) ]®/{P (a~1) [F (a) + Af(d)] - AF (4) -£(d-1)}, 


* » 9 10 11 12 
NY 2 3 4 


-0000 
-0001 


-0000 


-0000 
-0006 
-0042 
-0125 
0265 


-0464 
-0725 
-1047 
-1429 
- 1869 


+2367 
- 5696 
-0395 
-6476 
3970 


+2915 
-3344 


-0000 


-0000 
-0000 
-0003 
-0012 
-0033 


-0072 
-0132 
-0219 
-0334 
0480 


0658 
- 2067 
4365 
7574 
-1723 
- 6846 


-2978 
-0153 


g 
8 


0000 -0000 


0000 
-0000 
- 0000 
-0000 
-0000 


-0000 


-0482 
-0997 


32223 5888 


5640 


RRR RRR ee Re eee oe 
2 OEE . ‘ 


- 8099 


ON OQRWUNK KR RRR RR eee oe 


ee WhiNnrr 


-9412 
-131 - 5287 
- 564 - 8767 - 8399 
-161 -042 -8205 


+740 -841 9.2556 
+306 -283 12.157 
- 862 -374 15.531 10. 
408 +117 19.386 13. 
-07 68.705 45.977 32 


NOOVk WHNKK KR RRR RRR ee eR 
QhhWN NER RR PRR RR RR eee Pe 


O@ PBWNNK BPR RRR PRP RRR RR eee Oe 


20000 SCOUSe CUOUS OBYBDY BUWHO S 
= 
x 


ONE RH PRR ER BPR Ree RRR Re Ree ee 


ONE KH BPR RRR BPR RR RR eee eee ee oe 
CO 


ee 
ACoos Que 
PORWN KR RRR He 
CWOWNF RRR RR Ree 


to 
& 
~ 
~ 


"por d=1, (A) = (A+1)?. 


8. This same illustrative example was employed in [1], and the data are sum- 
marized as follows: ¢ = 3.7750, n = 2565, d = 8. Entering Table 1 with d = 8, 
we locate the pair of successive entries which brackets = 3.7750 and inter- 
polate linearly as summarized below. 


r x 


3.800 3.7068 
3.879 3.7750 


3.900 3.7932 


Accordingly we have } = 3.879, which value agrees with the more laboriously 
calculated result given in [1]. 


For the asymptotic variance, we interpolate in Table 2 to obtain 
¥s(3.879) = 1.165. 


From (8) we calculate 


V(A) = (3.879/2565)(1.165) = 0.0018, 
and finally 


oa, = VV(A) = 0.04. 


The assistance of Mr. Robert Everett in computing the tables for this paper 
is gratefully acknowledged. 


REFERENCES 


[1] Conen, A. C., Jr. (1954). Estimation of the Poisson parameter from truncated samples 
and from censored samples. J. Am. Stat. Assn. 49, 158-68. 


[2] Conn, A. C., Jr. (1960). Estimating the parameter in a conditional Poisson distribution. 
Biometrics 16, 203-11. 





438 A. CLIFFORD COHEN, JR. 


[3] Davi, F. N., anp Jounson, N. L. (1952). The truncated Poisson. Biometrics 8, 275-85. 
[4] Finney, D. J., anp Vartry, G. C. (1955). An example of the truncated Poisson dis- 
tribution. Biometrics 11, 387-94. 
(5] Irwin, J. O. (1959). On the estimation of the mean of a Poisson distribution from a 
sample with the zero class missing. Biometrics 15, 324-26. 
[6] Motta, E. C. (1942). Poisson’s Exponential Binomial Limit. D. Van Nostrand Co., Inc. 
(7] Moors, P. G. (1952). The estimation of the Poisson parameter from a truncated dis- 
tribution. Biometrics 39, 247-51. 
[8] Puackxertt, R. L. (1953). The truncated Poisson distribution. Biometrics 9, 485-88. 
(9] Riwer, Paut R. (1953). Truncated Poisson distributions. J. Am. Stat. Assn. 48, 826-30. 
(10) RutuHerrorp, E., anp Geiaer, Hans. (1910). The probability variations in the dis- 
tribution of a particles. Phil. Magazine Series 6, 20, 698. 
(11] Tats, R. F., anp Gorn, R. L. (1958). Minimum variance unbiased estimation for the 
truncated Poisson distribution. Ann. Math. Stat. 29, 755-65. 





Vor. 3, No. 3 TECHNOMETRICS Aucusrt, 1961 


Book Reviews 


“AppLiep STATISTICAL Decision THEORY”’ by H. Raiffa and R. Schlaifer. Division of 
Research, Harvard Business School (1961), 356 pp., $9.50. 


Anyone acquainted with R. Schlaifer’s ‘“‘Probability and statistics for business decistons’’ 
should be able to open this book almost anywhere and recognize the terrain. For instance, 
he will be surprised neither at the exclusion of conventional statistical tables nor at their 
replacement by tables of the density function of Student’s ¢ and of ¢(u) — u ¢(u) for the 
normal distribution. 

However, this book contrasts with Schalifer’s in its much stronger emphasis on mathe- 
matical technique. Also in contrast is the almost complete absence of explicit criticism of the 
grosser peccadillos of ‘“‘classical’’ statistics—which perhaps reveals the author’s growing 
confidence that their approach will become part of the routine statistical apparatus. 

The Preface announces: ‘‘(This) book is an introduction to the mathematical analysis of 
decision making when the state of the world is uncertain but further information about it 
can be obtained by experimentation. (The) objective of such analysis is to identify a course 
of action (which may or may not include experimentation) that is logically consistent with 
the decision maker’s own preferences for consequences, as expressed by numerical utilities, 
and with the weights he attaches to the possible states of the world, as expressed by numerical 
probabilities. (The) purpose of the present book is not to discuss --- basic principles but to 
contribute to the body of analytical techniques --- needed if practical decision problems are 
to be solved in accordance with them.’’ Thus the book is completely ‘‘Bayesian’’ in spirit 
and method. 

It opens with an Introduction which is fortunately so well written that, taken with the 
extensive cross-referencing in the body of the text, the reader can maintain his bearings in 
the apparently inevitable prolixity of the later sections. The main contribution to “the body 
of analytical techniques” is outlined in Part I (74 pp) under the heading ‘“‘Conjugate Prior 
Distributions.” When the experimental data consists of observations of independently, 
identically distributed random variables and there is a sufficient statistic y of fixed dimen- 
sionality, conjugate distributions are given, apart from proportionality constants, by the 
likelihood functions for values of y for which the latter functions are integrable; more conjugate 
distributions may be constructed by extending the range of y and by allowing constants in 
the likelihood function to vary. (For example, the conjugate distributions for @ of the normal 
process in which y has the N(@, 1/n) distribution are the set {N(a, B)| — 7 <a< o, 
0 < B <~@} of distributions of @.) The whole family F of conjugate distributions has the 
important closure property that, for any y and any prior distribution from F, the posterior 
distribution is also in F. The advantages of this property lie with the resulting increase in 
mathematical tractability and become clear when five common data-generating processes 
(Bernoulli, Poisson, Normal, Multinormal and Normal Regression) are analysed with conjugate 
prior distributions. This analysis, occupying most of the book, is carried out with great detail 
and clarity. 

Part II (129 pp) deals with the commonly occurring case when the utility function is the 
sum of two parts: a sampling component, determined by the cost of the experiment and the 
particular data obtained, and a terminal component, determined by the appropriateness of 
the chosen action to the actual ‘state of the world.” If, additionally, the terminal utility for 
each action is linear in 6 (a parameter representing the “‘state of the world”), the optimal 
action is based on the posterior mean of 6. This specialisation includes such topics as (i) the 
choice of sample-size in the two-action problem when the observations are either binomial 
or normal with known variance (ii) the selection of the best of several normal processes (since 
the analysis is Bayesian, not all the processes need to be sampled!). Part II ends with the 


case where actions are equivalent to point-estimates of 6; both linear and quadratic loss 
functions are considered. 


439 





440 BOOK REVIEWS 


Part III (143 pp) develops the distribution theory required in Parts I and II. As its length 
might indicate, it contains more than is strictly necessary and the authors hint at possible 
applications of the overflow to stratified sampling and normal regression experiments. 

Throughout Parts I and II there are supporting calculations of such quantities as the 
“expected value of perfect information” and the “expected value of sample information.” 
In addition, exemplary applications of the methods are made to problems drawn from industry, 
commerce and medicine. 

Have the authors succeeded in making a major contribution to the analytical techniques 
of Bayesian analysis? The answer to this will be decided by the experience of statisticians and 
others who use conjugate distributions to represent their prior knowledge; it will largely 
depend on the truth of the assertion (p 44) that, in certain commonly occurring situations, 
the family F of conjugate distributions will come close to being so rich that “‘there will exist 
a member of F capable of expressing the decision-maker’s prior information and beliefs.’’ 
It is a pity that more space was not devoted to explaining the grounds for this optimism— 
even if it meant pruning the multivariate analysis. Concerning the fitting of conjugate dis- 
tributions to prior knowledge, there is the statement on p 59 that “‘what experience we have 
had with concrete examples convinces us that in the great majority of applications the 
method of fitting will have absolutely no material effect on the results of the final analysis ... .” 
On p 67, the analysis of a Bernoulli process leads to the conclusion that “the criticality of 
the choice of a prior distribution depends crucially on the (observation).” In the same section, 
sequential designs involving a high “‘stage-cost’”’ are asserted to depend critically on the 
prior distribution. §5.6.4 uses a normal distribution for the parameter of a Bernoulli process 
to approximate the actual, conjugate prior distribution and shows that the change in the 
recommended “optimal” sample-size affects the net gain only slightly. For the normal process 
with unknown mean yu and unknown variance o?, a conjugate distribution allocates a gamma 
distribution to 1/c* and a conditional normal distribution with variance proportional to o? 
to w; on p. xviii, the authors acknowledge that such a distribution ‘cannot give a good repre- 
sentation of tight opinions about yu in the face of loose opinions about (o).’”’ Even if these 
remarks all supported the initial assertion, they would not constitute an adequate basis for it. 

The degree and importance of the conflict between mathematical tractability and satis- 
factory agreement with prior knowledge remain uncertain. 

On all other fronts, the book is excellent. §3.3.3 has an interesting discussion of the pos- 
sibility and occasional appropriateness of assigning a probability distribution to a parameter 
directly on the basis of the data, thereby avoiding the use of Bayes’ Theorem. §1.4 emphasises 
that the assignment of a utility function is really that of the mean value of a utility function 
of wider domain. Chapter 6 contains a stimulating discussion of the Bayesian approach to 
estimation. 

It appears likely that this book will be the starting point of many lines of enquiry. Its 
basic philosophy is by no means new; for instance, on p 13 of an 1879 paper (Tidsskrift for 
Mathematik, Vol. 4, Ser. 3), F. Bing considered a Bayesian solution of the problem of deciding 
whether or not to buy a cargo of fruit after a sample has been taken. What ts new is the 
impressive seriousness of the present contribution. 

M. Stone 
Princeton Univ. 


“Time Serres Anatysis” by E. J. Hannan. pp. viii + 152. London: Methuen & Co. 
Ltd., New York: Wiley & Sons Inc. 1960. 


This excellent little book is one of a new series of monographs on applied probability and 
statistics under the general editorship of M. S. Bartlett. What does need making clear to the 
prospective reader is the exact nature of the contents, since the title is slightly misleading. 
What Prof. Hannan has in fact done is to set out the mathematical principles underlying 
the spectral analysis of univariate stationary time-series in discrete time. Time-series generated 
by birth-and-death type processes, such as arise in biology and physics, are not mentioned, 
and the practical details of time series analysis are sometimes only hinted at. 

The first chapter establishes the basic mathematical results, starting with the circular 





BOOK REVIEWS 44] 


process in which “‘time’’ is restricted to a finite set of values and proceeding to the general 
case. Chapters 2 and 3 discuss the estimation of the correlogram and the spectral density 
and distribution functions, chapter 4 describes hypothesis testing and the setting up of con- 
fidence limits, and chapter 5 deals with series in which the stochastic part is superimposed 
on a deterministic component. The level of mathematical sophistication is fairly high; in his 
preface, the author suggests that a knowledge of mathematics and statistics up to the level 
of Cramer’s ‘Mathematical Methods of Statistics’? would be desirable in the reader, and 
this is no understatement. 

The book, which is well produced at a reasonable price, is a first-rate introduction to 
spectral and autocorrelation theory for the mathematical statistician, though it cannot be 
regarded as an adequate guide to the practical application of these techniques. 

M. J. R. Healy 





Vor. 3, No. 3 TECHNOMETRICS Aucust, 1961 


Errata 


“Average Run Lengths in Cumulative Chart Quality Control Schemes” 
P. L. GotpsmitH ano H. Wuitrietp, TECHNOMETRICS, Vol. 3, No. 1, February, 1961 


In the last line, but one, above Figure 8 on page 19, the word ‘increases’ 
should be replaced by ‘decreases’ so as to read, “... positive correlation in the 
process decreases the effectiveness of the cumulative chart control scheme .. .”’. 


So We wewme wa 


t 
a 
4 
¢ 
r 
a 
€ 
i 
I 
1 
I 
f 
( 
{ 
{ 
( 





TECHNOMETRICS 


Statistical Programs for 


Aucust, 1961 


High Speed Computers 


Assembled for Technometrics by 


Frep C. LEONE 


Case Institute of Technology 


Readers are requested to send complete 
writeups of computer programs of interest to 
statisticians to Dr. Fred C. Leone, Director, 
Statistical Laboratory, Case Institute of 
Technology, 10900 Euclid Avenue, Cleveland 
6, Ohio, for consideration for publication in 
Technometrics. Each computer program 
announcement is to contain: 

1. A brief problem description, 2. The 
type of computer for which the program is 
applicable, 3. The author’s name and address, 
4. A brief description of the program in- 
cluding: Limitations, auxiliary equipment 
requirements, statements on accuracy, avail- 
ability of sample problems, running time 
estimates and programs storage requirements. 
All programs submitted for publication must 
be available for distribution subject only 
to nominal costs for the reproduction and 
mailing of program tapes, cards, etc. Inquiries 
about published programs should be ad- 
dressed to the authors. Comments and cri- 
tiques of the published programs may be sent 
to Dr. Leone and will be published when 
considered appropriate. 


Multiple Regression Analysis(TRAP),6.0.030, 
IBM 650 


This program is divided into two parts. 
The output of part I is stored on tape as 
input for part II. Part I computes (optional) 
in floating decimal point: the set of variables 
for each observation, the set of transformed 
variables for each observation, the set of 
terms made up of original and transformed 
values of variables for each observation, and 
an indication of how the hash total compares 
to the machine total of the observations. 

Part II computes and prints (optional) in 
floating decimal point: an indication of how 
the hash total compares to the machine 
total of observations, the original matrix of 
S(X;X;) and S(X;Y;) in order by row, the 
inverse matrix, the value of a) (for each 


443 


dependent variable), the values of a;(t 1 
to n), the total variation, variation by regres- 
sion, correlation coefficient, R?, degrees of 
freedom, error of variance and the standard 
deviation for each dependent variable, 
“‘F-test’’ for each of the coefficients a,(i = 1 
to n), the value of Student’s ‘‘t’’ for each of 
the coefficients a;(i 1 to n), a table of the 
observed and calculated values for each 
dependent variable and their differences, 
the sum of the differences squared, the chi- 
square test, and a variance check for each 
dependent variable. 

The program requires: (Number of terms 
containing independent variables) + (Num- 
ber of dependent variables) must not exceed 
26; the number of dependent variables must 
not exceed 9; the magnitude of the variables 
(original or transformed) must be such that 
N log X < 100, where N is the largest power 
to which the variable appears in the poly- 
nomial being fitted; if a transformed variable 
is negative, it must not be raised to a frac- 
tional power because there is no provision 
in the program for complex numbers. 

A maximum of 999 observations of 32 
positive or negative variables may be used. 
Special TRAP 533 and 407 boards are re- 
quired for input and output. Various options 
of the output can be selected. The program 
was successfully run. Both a sample problem 
and flow diagram are available. Mr. Donald 
Cashman, SHARE Distribution Agency, 
International Business Machines Corp., 590 
Madison Ave., New York 22, N. Y. 


Correlation Program, 1039, January 1961, 
Burroughs 220 (ALGOL) 


This correlation program was designed 
for a rapid calculation of the simple correla- 
tion coefficients where a minimal amount of 
accuracy was needed. It will compute the 
sums, sums of squares and cross products, 
means, standard deviations, and correlation 





444 FRED C. 


matrix for up to 80, 3-digit variables and for 
a maximum of 10,000 observations. The 
results are printed out on an on-line 407 
printer. This program requires that the 
Burroughs 220 be equipped with 5000 words 
of core storage and a cardatron with card 
input. This program was successfully run. 
A flow diagram and sample problems are 
available on request. Director, Statistical 
Laboratory, Case Institute of Technology, 
Cleveland 6, Ohio. 


Multivariate Regression Analysis (97-03-001), 
RCA 501 

Provides the best linear predicting equation 
or model from either a minimal subset of the 
original variates or from the total set, at the 
user’s option. Output includes (1) the con- 
stant term in the regression equation, (2) the 
regression coefficients, (3) the F ratio of the 
squared regression coefficient to the estimated 
variance of the coefficient, (4) probability 
that the regression coefficient does not differ 
significantly from zero (where the level of 
significance is optional at 0.1%, 1%, 5%, 
10%, or 20%, and (5) standard error of 
estimate. 

Additional options include polynomial 
regression on one or more variables and 
transformation of data. 

Limitations: 

1. The number of six digit observations 
or data points (V):3 < N < 262,143 

2. The number of independent three 
digit variates < 999. 


A sample problem is available. Radio 
Corporation of America, Electronic Data 
Processing Div., Camden, N. J. 


Time Series Decomposition and Adjustment 
Program, PA#526 TV TSDA, IBM 704 


This program is written to adjust seasonal 
and irregular time series to a form that shows 
primarily the trend-cyclical movements. 
Seasonal factors, irregular fluctuations and 
many summary measures useful in time series 
analysis are computed in the process. 

The program requires: (1) 16K core 
memory, 4 tapes for running object program. 
Source program may be compiled on other 
machine configurations. (2) FORTRAN 
system tape to compile. Due to the size of 
the source program, it is necessary to compile 
in 3 parts. Instructions are used at the end 
of each part in compiling. This will allow 
transfer instructions to be added in absolute 


LEONE 


for joining the 3 parts into one complete 
program. (3) Program for a 16K machine 
will analyze a maximum of 30 years and a 
minimum of 7 years of monthly data. All 
time series must begin in January for proper 
alignment of output tables. Input data for 
series should not exceed 8 digits. Mr. Donald 
Cashman, SHARE Distribution Agency, 
International Business Machines Corp, 590 
Madison Ave., New York 22, N. Y. 

Reference: ‘Electronic Computers and 
Business Indicators’, Julius Shiskin, Paper 
57, National Bureau of Economic Research, 
Inc., New York, N. Y. 


Factor Analysis, BIMED 017, IBM 709, 
October 1960. 


Computes means and standard deviations, 
correlation coefficients, eigenvalues including 
cumulative proportions of total variance, 
eigenvectors, factor matrix, factor check 
matrix, rotated factor matrix, original and 
successive variances and check on communali- 
ties. The program can perform a different 
transformation of each variable. It performs 
a factor analysis for any of the following 
three cases: (1) The number of factors to be 
rotated is equal to the number of eigenvalues 
of the correlation matrix (with ones in the 
diagonal) which are greater than unity, (2) 
The number of factors to be rotated is equal 
to the number of eigenvalues of the correla- 
tion matrix (with ones in the diagonal) which 
are greater than zero, (3) The number of 
factors to be rotated is equal to the number 
of eigenvalues which are greater than zero 
of the correlation matrix modified by insertion 
of the squared multiple correlation coefficients 
(of each variable on the remaining variables) 
in the diagonal. 

Limitations: 

1. The maximum number of variables 
which can be processed by this pro- 
gram is 80. 

2. There is no limit on sample size. 


The above program allows a variable input 
format. A sample problem is included. 
Division of Biostatistics, Department of 
Preventive Medicine and Public Health, 
School of Medicine, U. C. L. A., Los Angeles 
24, California. (Prof. W. J. Dixon) 


Centroid Factor Analysis Program, Univac 
Scientific 1103 


This program produces from an nth order 





STATISTICAL PROGRAMS FOR HIGH SPEED COMPUTERS 


matrix of correlations R = [r;;] a set of r 
linearly independent effects or factors. It also 
calculates the factor matrix. This program has 
the option of starting with the users’ estimates 
of the communalities (values to be placed in 
the principal diagonal) or of using the largest 
element in each row of FR as the estimate. 
After extraction of the first factor, the largest 
element in each row of R is used automat- 
ically. The program capacity is 149 variables. 
Floating point arithmetic is used and a 
maximum of eight significant decimal digits 
is carried throughout the calculation. Mr. 
Paul Minton, Director, Computing Labora- 
tory, Southern Methodist University, Dallas, 
Texas. 


Lattice Designs, 06.4.005.1, IBM 650 


Computes the analysis of variance, treat- 
ment means, adjusted treatment means, 
variance of treatment differences and effi- 
ciency for rectangular or square lattices. The 
lattices may be balanced or unbalanced and 
may contain any number of groups (squares 
or rectangles). 

Limitation: 


niro(l + 1) + 21+ 1) +11 + kg + 1)) 
< 951 andr, < 15 
where g = 1;/3 (increased to the next 
whole number). 
7, = number of groups, i.e. number of 
squares or rectangles in basic design. 
v2 = number of repetitions, i.e. number of 
times basic design is repeated. 
k = number of plots per block. 
! = number of blocks per group. 


Dr. Arnold Grandage, Dept. of Experimental 
Statistics, North Carolina State College, 
Box 5457, Raleigh, North Carolina. 


Analysis of Covariance, F4-166, LGP-30 


Computes the analysis of covariance for a 
completely randomized experiment (one-way 
classification). The program also gives the 
adjusted treatment means and their standard 
errors. An optional transformation of the 
data to 1000 log z and/or 1000 log y is 
included. Double precision fixed point 
arithmetic is used. The program permits any 
number of treatments (k) provided the data 
may be stored sequentially on the drum. For 
the amount of data permitted, the only 
restriction is the amount of memory avail- 
able. All data must be non-negative. 


445 
Limitations: 


1. Data must be four decimal digits 
in length. 

2. }oz and }-y must be less than 2”. 

3. All corrected sums of squares and 
sums of products which are printed 
as output must be less than 2”. 

. All covariance slopes (coded) for 
individual treatments must be less 
than 16. 


Sample problem, flow charts and program 
listing are included in the writeup. POOL, 
1532 N. Cahuenga Blvd., Los Angeles, 
California. 


Quantal Response Analysis (Probit Analysis), 
MERCURY, November 1958 

The aim of the memorandum is to outline 
the Methods of Quantal Response Analysis 
as well as the program instructions. Section I 
describes the method of the analysis and the 
statistical tests applied to the results obtained. 
Sections II and III describe details of the use 
of the Mercury Program and details of the 
statistics described earlier. The program is 
available as a binary punched paper tape. 
After the completion of each case the com- 
puter returns to the beginning of the program 
and stops to read another tape. The program 
includes the variance of the probit, the 0% 
and 100% responses test for goodness of fit 
and least squares fit. Several transformations 
of the data are available. Mr. B. E. Cooper, 
Theoretical Physics, Atomic Energy Research 
Establishment, Harwell, Didcot, Berks, Eng- 
land. 


Paired Comparisons from Balanced Incomplete 
Blocks—A 6-31 DESIGN, PARCOBIB, IBM 
650, July 1959 

This program is prepared to analyze 
results from a questionnaire involving 31 
objects, arranged in 31 sets, or blocks, of 6 
objects each, according to a balanced in- 
complete block design and gives the paired 
comparisons matrix and scale values deter- 
mined from this matrix. A variant of design 
11.40 or of design 13.14, as specified in 
Cochran and Cox Experimental Designs 
(Wiley, N. Y.), is used for the questionnaire 
appropriate to this program. The program 
will handle a maximum of 999 subjects in a 
single group. Fixed point arithmetic is used 
throughout. Proportions are rounded to four 
decimals. The approximation for the normal 
deviate, arc sine, and logistic have a maximum 





446 


discrepancy of .0005 for proportions between 
.98 and .02. The least squares solution for 
scale values is used. Scale values are com- 
puted, using the normal deviate, the arc sine, 
and the logistic transform. The program 
processes each subject in about 35 seconds. 
Mr. Harold Gulliksen, Educational Testing 
Service, 20 Nassau Street, Princeton, New 
Jersey. 


FRED C. 


Analysis of Variance for a Randomized Com- 
plete Block Design with Unequal Observations 
in the Subclasses, BU 220, CAP 


Where subclasses have unequal numbers 
of observations, but are proportional to each 
other, the mean square estimate of variance 
is computed and Snedecor’s F-test applied. 
If the subclasses have disproportionate 
numbers, the interaction sum of squares is 
adjusted by fitting constants and those of the 
treatments and blocks by weighted means. 

Restrictions: 


1. Each treatment must be equally 
replicated 
2k < 99," < 99 (k number of 
blocks, n = number of treatments) 
. 3kn < 2500 
2(kn + m*) < 2500 (m is maximum 
value, either k or n) 
. Each subclass must have at least one 
observation and less than 10°. 


The approximate time will be the sum of 
the input time, 8.5 seconds, and both the 
following formulae: 


LEONE 
A = [4(k® + 3n*) X 1074] sec. 
B = [0.4(k + n)] sec. 


A sample problem is available. Director, 
University, 


Computing Center, Cornell 


Ithaca, New York 


Discriminant Analysis, BIMED No. 004, 
IBM 709, FORTRAN, October 15, 1959 


This program computes mean scores, the 
matrix of cross products of deviation from 
means, dispersion matrix, inverse of disper- 
sion matrix, coefficients and constants, 
evaluation of classification function for each 
individual and classification of matrix. The 
largest number of groups which can be 
processed by this program is 5. The largest 
number of variables is 25 which must be 
the same for all groups. Maximum group 
size is 150. The sample size may be different 
from one group to another. A sample problem 
is available. Professor W. J. Dixon, Divi- 
sion of Biostatistics, Dept. of Preventive 
Medicine and Public Health, School of 
Medicine, U. C. L. A., Los Angeles 24, 
California. 


CORRECTION 


The Balanced Lattice Square, October 1960, 
IBM 650, on page 302 of the May issue, 
should read: 


The Balanced Lattice Square, October 1960, 
Burroughs 220 





TECHNOMETRICS Aucust, 1961 


NOTICES 


The Editor of Technometrics recently accepted the post of Associate Professor 
in the Department of Chemical Engineering at Princeton University. Beginning 
September 1961, all mail addressed to the Editor should be sent to 


J. S. Hunter 

Department of Chemical Engineering 
Princeton University 

Princeton, New Jersey. 


1961 ANNUAL MEETING OF THE AMERICAN STATISTICAL ASSOCIATION 


Nearly 50 sessions have been scheduled thus far for the Annual Meeting of the 
American Statistical Association to be held at the Roosevelt Hotel in New 
York City, December 27-30, 1961. However, suggestions, particularly for 
speakers or discussants, would still be welcomed, and should be addressed to the 
sectional program chairman, or to the chairman of the ASA Program Committee, 
George P. Hitchings, American Airlines, 100 Park Avenue, New York City. 


Arthur M. Dutton, University of Rochester, is program chairman for the Bio- 
metrics Society-ENAR, as well as for the Biometrics Section of ASA, and 
Walter Smith, University of North Carolina, is program chairman for the 
Institute of Mathematical Statistics. Both of these organizations are meeting 
jointly with the ASA, as are also the American Economic Association, The 
American Marketing Association and other societies. 

The list of sessions planned for the Physical and Engineering Sciences Section 
are: 

1. Spectral Analysis in Geophysical Problems; 2. Statistical Problems in 
Satellite Tracking; 3. Computer Applications; 4. Experimental Statistics; 5. 
Non-Linear Regression. Sectional Program Chairman: Ray B. Murphy, Bell 
Telephone Laboratories, Inc., 463 West Street, New York 14, New York— 
CH 3.1000. 

The list of sessions planned for the Training of Statisticians Section are: 

1. Statistics Training at the Secondary School Level: 2. Audio-Visual 
Aspects in the Teaching of Statistics; 3. Decision Theory in Basic Statistics 
Training for Engineering and the Physical Sciences (Joint with Physical and 
Engineering Sciences Section): 4. Decision Theory in Basic Statistics Training 
for Business, Economics and the Social Sciences; 5. Statistics in Advertising 
and Marketing; 6. The Impact of the Computer on the Training of Statisticians. 
Sectional Program Chairman: Samuel B. Richmond, Graduate School of 
Business, Columbia University, New York 27, New York—UN 5-4000. 


447 





INDUSTRIAL QUALITY CONTROL 
e Vor. XVII, No. 12, June 1961 


MANAGEMENT DEVELOPMENT THROUGH QUALITY CONTROL 
A.W. Wortham 


QUALITY IN THE AUTOMOTIVE INDUSTRY .......... Edward G. Budd, Jr. 


Lire Test—SomME PRACTICAL CONSIDERATIONS 
H.J. Davis and B. P. Goldsmith 


ee A CeCe -BMMUE Lis sear as on owwknxtheme vee Ralph Von Osinski 
MoperN INSPECTION TECHNIQUES AND AUTOMATION. .Edward A. Reynolds 


SELECTION OF FLAVOR PANELS FOR COMPLEX FLAVOR DIFFERENCES 
Mae-Goodwin Tarver and Barbara Hall Ellis 


e VoL. XVIII, No. 1, Jury 1961 


PowER CHARACTERISTICS OF CONTROL CHARTS Edwin G. Olds 


A QUANTITATIVE APPROACH TO CLASSIFICATION OF CHARACTERISTICS 
Benjamin W. Marguglio and Martin W. Sullivan 


‘““VipEOsONIC” SysTEM INSTRUCTION RAISE QUALITY STANDARDS 
David A. Hill and John J. Tamsen 


RELIABILITY—BOTH A TOOL AND OBJECTIVE IN DESIGN 
S. N. Greenberg and S. Zwerling 


RELATIONSHIP BETWEEN PROCUREMENT AND QUALITY CONTROL 
R. B. Walworth 


Industrial Quality Control is published monthly by the American Society for 
Quality Control, Inc. All correspondence concerning membership in the society and 
subscriptions to the journal should be addressed to American Society for Quality 
Control, Inc., Rm 6185 Plankinton Bldg., 161 West Wisconsin Ave., Milwaukee 3, 
Wisconsin. 





Journal of the 
AMERICAN STATISTICAL ASSOCIATION 


VoLUME 56 NuMBER 294 
JUNE 1961 


TABLE OF CONTENTS 


Confidence Curves: An Omnibus Technique for Estimation 
and Testing Statistical Hypotheses Allan Birnbaum 
Changes in the Size Distribution of Dividend Income . .Edwin B. Cox 
Note on Curve Fitting with Minimum Deviations by 
Linear Programming Walter D. Fisher 
Bivariate Logistic Distributions ..................205: E. J. Gumbel 
Partial Correlations in Regression Computations Robert L. Gustafson 
An Analysis of Consistency of Response in Household Surveys 
Carol M. Jaeger and Jean L. Pennock 
Multiple Regression Analysis of a Poisson Process ..Dale W. Jorgenson 
Factorial Treatments in Rectangular Lattice Designs 
Clyde Y. Kramer and Leroy S. Brenna 
Significance Tests in Discrete Distributions H. O. Lancaster 
Exact and Approximate Distributions for the Wilcoxon 
Statistic with Ties Shirley Y. Lehman 
The Use of Sample Quasi-Ranges in Setting Confidence 
Intervals for the Population Standard Deviation 
Fred C. Leone, C. W. Topp, and Y. H. Rutenberg 
Randomized Rounded-Off Multipliers in Sampling Theory 
M.N. Murthy and V. K. Sethi 
Unbiased Componentwise Ratio Estimation 
D. S. Robson and Chitra Vithayasai 
A Note on Measurement Errors and Detecting Real Differences 
Eugene Rogot 
A Quarterly Econometric Model of the United States 
Lowell E. Gallaway and Paul E. Smith 
On the Use of Partially Ordered Observations in Measuring 
the Support for a Complete Order R. F. Tate 
The Statistical Work of Oskar Anderson 
A Problem Concerned with Weighting of Distributions 
Coleridge A. Wilkins 


For further information, please contact: 


American Statistical Association 


1757 K Street, N.W. 
Washington 6, D. C. 





BIOMETRICS 
Journal of the Biometric Society 
Vol. 17, No. 2 CONTENTS June 1961 


On Additivity in the Analysis of Variance R. C. Elston 


A Note on Some Growth Patterns in a Simple Theoretical Organism 
J. A. Nelder 


The Partial Dialled Cross O. Kempthorne and R. N. Curnow 


Sensory Testing by Triple Comparisons G. T. Park 


Rapid Chi-Square Test of Significance for Three-Part Ratios 
Charles E. Gates and Benjamin H. Beard 


Generating Unbiased Ratio and Regression Estimators W. H. Williams 


A Biometric Theory of Middle and Long Distance Track Records 
Malcolm E. Turner and Eleanor D. Campbell 


Small Sample Behavior of Slope Estimators in a Linear Functional Relation 
Martin Dorff and John Gurland 
Statistics for a Diagnostic Model 
Adrianus J. van Woerkom and Keeve Brodman 
Queries and Notes 
Three-Quarter Replicates of 2* and 2° Designs Peter W. M. John 
On the Extension of Stevens’ Tables for Asymptotic Regression _ S. Lipton 
Corrected Error Rates for Duncan’s New Multiple Range Test 
H. Leon Harter 
Multiple Comparisons between Treatments and a Control C.W. Dunnett 
Error Rates in Multiple Comparisons R. G. D. Steel 


Book Reviews 
D. 8S. Falconer: Introduction to Quantitative Genetics W. F. Bodmer 
S. Goldberg: Probability: An Introduction F. N. David 


Biometrics is published quarterly. Its objects are to describe and exemplify the use of 
mathematical and statistical methods in biological and related sciences, in a form assimilable 
by experimenters. The annual non-member subscription rate is $7. Inquiries, orders for back 
issues and non-member subscriptions should be addressed to: 


BIOMETRICS 

Department of Statistics 
The Florida State University 
Tallahassee, Florida 





BIOMETRIKA 


Volume 48, Parts 1 and 2 June 1961 


CONTENTS 


Memoirs: 


KENDALL, M. G. Studies in the history of probability and statistics XI. Daniel Bernoulli 
on maximum likelihood. 


Davip, F. N. AND MALLows, C. L. The variance of Spearman’s rho in normal samples. 
FIELLER, E. C. AND PEARSON, E. S. Tests for rank correlation coefficients. II. 

DurBiN, J. Some methods of constructing exact tests. 

HEATHCOTE, C. R. Preemptive priority queueing. 

HAJNAL, J. A two-sample sequential t-test. 

NaseyA, S. Absolute and incomplete moments of the multivariate normal distribution. 


Wuite, JoHNn S. Asymptotic expansions for the mean and variance of the serial correla- 
tion coefficient. 


Starks, T. H. AND Davin, H. A. Significance tests for paired-comparison experiments. 
Watson, G. S. Goodness-of-fit tests on a circle. 


Gonin, H. T. The use of orthogonal polynomials of the positive and negative binomial 
frequency functions in curve fitting by Aitken’s method. 


VERHAGEN, A. M. W. The estimation of regression and error-scale parameters, when the 
joint distribution of the errors is of any continuous form and known apart from a 
scale parameter. 


MALLows, C. L. Latent vectors of random symmetric matrices. 

Harter, H. Leon. Expected values of normal order statistics. 

HAIGHT, FRANK A. A distribution analogous to the Borel-Tanner. 

NICHOLSON, W. L. Occupancy probability distribution critical points. 

OKAMOTO, MASASHI AND IsHit, Goro. Test of independence in intraclass 2 x 2 tables. 


Miscellanea: 


Contributions by M. Atiqullah, D. E. Barton, D. E. Barton and F. N. David, Colin 
R. Blyth and David W. Hutchinson. W. J. Ewens, J. Gani, J. C. Gower, M. J. R. 
Healy and J. C. Gower, M. G. Kendall, K. C. S. Pillai and Angeles R. Buenaventura, 
M. M. Sondhi, J. C. Tanner, A. M. Walker. 


Reviews. Other Books received. Corrigenda. 


The subscription, payable in advance, is 54/—(or $8.00) per volume (including post- 
age). Cheques should be made payable to Biometrika, crossed “a/c Biometrika 
Trust” and sent to the Secretary. Biometrika Office, University College, London, 
W.C.1. All foreign cheques must be drawn on a Bank having a London agency. 


Issued by THE BIOMETRIKA OFFICE, University College, London 





THE ANNALS OF MATHEMATICAL STATISTICS 


Vol. 32, No. 2—June, 1961 


Contents 

Georges Darmois, 1888-1960 
The Existence and Construction of Balanced Incomplete Block Designs 
Random Allocation Designs II: Approximate Theory for 

Simple Random Allocation sind - A, P. Dempster 
Sampling Moments of Means from Finite Multivariate Populations D. W. Behnken 
On the Foundations of Statistical Inference, I: Binary Experiments 
Some Extensions of the Idea of Bias 
Multivariate Correlation Models with Mixed Discrete and 

PUN WIENS > 55 oo o.5o sloicanc cede 'ecleseies <taesdaqeousees I. Olkin and R. F. Tate 
Limits for a Variance Component with an Exact Confidence Coefficient W. C. Healy, Jr. 
Confidence Sets for Multivariate Medians ................e0eee00% P. G. Hoel and E. M. Scheuer 
Distribution Free Tests of Independence Based on the 

Sample Distribution Function ...............0.+..- J. R. Blum, J. Kiefer, and M. Rosenblatt 
Some Exact Results for One-Sided Distribution Tests 

ge EP Pee er eee err P. Whittle 
Some Extensions of the Wald-Wolfowitz-Noether Theorem Jaroslav Hdjek 
The Gap Test for Random Sequences Eve Bofinger and V. J. Bofinger 
The Multivariate Saddlepoint Method and Chi-Squared 

Gar GE RARER TPSERR UN 5.5555 eine ea 5 SS his SASS has eee sbu ss Ohad I. J. Good 
A Generalization of Wold’s Identity with Applications to Random Walks H. D. Miller 
A Characterization of the Weak Convergence of Measures 
Exponential Bounds on the Probability of Error for 

a Discrete Memoryless Channel 
An Exponential Bound on the Strong Law of Large Numbers for Linear 

Stochastic Processes with Absolutely Convergent Coefficients ............... L, H. Koopmans 
Expected Utility for Queues Servicing Messages with 

Exponentially Decaying Utility Frank A. Haight 
On the Coding Theorem for the Noiseless Channel Patrick Billingsley 


Notes: 


The Essential Completeness of the Class of Generalized 
Sequential Probability Ratio Tests 


A Problem in Survival 

First Passage Time for a Particular Gaussian Process 

A Note on the Ergodic Theorem of Information Theory 
Remark Concerning Two-State Semi-Markov Processes 


An Example of an Ancillary Statistic and the Combination of 
Two Samples by Bayes’ Theorem 


Abstracts of Papers 
News and Notices 


Publications Received 


The purpose of the Institute of Mathematical Statistics is to encourage the development, 
dissemination and application of mathematical statistics. Membership dues, which include a 
subscription to the Annals of Mathematical Statistics are $10.00 per year for residents of the 
United States or Canada and $5.00 a year for residents of other countries. Inquiries regarding 
membership to the Institute should be sent to the Secretary: G. E. Nicholson Jr., Department of 
Statistics, University of North Carolina, Chapel Hill, North Carolina. 





PREPARATION OF MANUSCRIPTS 


Manuseripts should be submitted to the office of the editor: J. 8. Hunter, 
Department of Chemical Engineering, Princeton University, Princeton, New 
Jersey. Each manuscript should be typewritten, double spaced, with wide 
margins at sides, top, and bottom. The original should be submitted with two 
additional copies, on paper that will take corrections. Dittoed or mimeographed 
papers are acceptable only if completely legible. Footnotes should be avoided 
and replaced by remarks in the text, or placed in an appendix. Preferably, 
references in the manuscript should appear as (Jones, A. B., 1958), and again 
later in alphabetical order in a list of references. Alternatively, references may 
be numbered, e.g. [1], as they appear in the manuscript and be listed in this 
sequence in the list of references. In the reference list, each reference should 
contain, in the order indicated, the name and initials of the author followed 
by those of the co-authors, date of publication, title of reference, source, volume 
number and page: References to books should include publisher’s name and 
location. 

Figures, charts, and diagrams should be professionally drawn on plain white 
paper or tracing cloth in black India ink twice the size they are to be printed. 
A full page diagram, in print, measures 7.25 X 4.75 inches. 

As far as possible, formulas should be typewritten and symbols not available 
on a typewriter carefully inserted in ink. Authors are asked to keep in mind the 
typographical difficulties of complicated mathematical formulae. The difference 
between capital and lower-case letters should be clearly shown; care should be 
taken to avoid confusion between such pairs as zero and the letter O, the numeral 
1 and the letter /, numeral 1 used as superscript and prime (‘) alpha and a, kappa 
and k, mu and wu, nu and »; eta and n, etc. Subscripts or superscripts should be 
clearly below or above the line. Bars above groups of letters (e.g., log x) and 
underlined letters (e.g., ) are difficult to print and should be avoided. Symbols 
are automatically italicized by the printer and should not be underlined on 
manuscripts. Boldface letters may be indicated by underlining with a wavy line 
on the manuscript; boldface subscripts and superscripts are not available. 
Complicated exponentials should be represented with the symbol exp particu- 
larly when appearing in the text, that is, 


exp [(a” + 67)'”] should be used in place of e“’**”’”’, 
In writing square roots the fractional exponent is preferable to the radical sign. 
Fractions in the body of the text (and when possible in displayed expressions) 


and fractions occurring in the numerators or denominators of fractions are 
preferably written with the solidus; thus 


a+b 
(a + b)/(c + d) rather than “=~ 


Authors will ordinarily receive only galley proofs. Fifty offprints without 
covers will be furnished free. Costs for additional reprints and covers can be 
furnished on request. 





CONTENTS _ 


TECHNOMETRICS, Vol. 3, No. 3, AUGUST 1961 


The 2°” Fractional Factorial Designs 
G. E. P. Box and J.S, Hunter 


Partial Confounding in Fractional Replication. ..W. J. Youden 


Finding New Fractions of Factorial Experimental 
EE SA EES ene: Sep wy: veces R. E. Fry 


A Study of the Group Screening Method ........ G. S. Watson 
Missing Values in Response Surface Designs. .Norman R. Draper 


The Optimum Allocation of Spare Components in 
-Donald F. Morrison 


Use of Tables of Percentage Points of Range and 
Studentized Range .H. Leon Harter 


The Reliability of Components Exhibiting Cumulative 
Damage Effects 


An Analysis of Some Relay Failure Data from a 
Composite Exponential Population 
R. R. Prairie and B. Ostle 


Applications of Truncated Distributions in Process Start-ups 
and Inventory Control......... H. Smith and D. W. Grace 


Estimating the Poisson Parameter from Samples that 
Are Truncated on the Right. ...... .A Clifford Cohen, Jr. 


Book Reviews M. Stone and M. J. R. Healy 





